Abstract
In reinforcement learning (RL), agents often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.
Abstract (translated)
在强化学习(RL)中,代理通常难以在与训练期间遇到的任务不同的任务上表现出色。这种限制对将RL广泛部署到多变和动态的任务环境中构成了挑战。在这项工作中,我们引入了记忆增强技术,这是一种基于记忆的RL方法,旨在改进任务泛化能力。我们的方法利用任务结构化的增强来模拟可能的分布外场景,并通过整合内存机制使策略能够根据上下文进行适应性调整。在一组预定义的任务上训练后,我们的策略可以通过记忆增强展示出对未见任务的一般化能力,而无需与环境进一步交互。 通过广泛的仿真实验和在腿部步行任务上的真实世界硬件评估,我们证明了我们的方法能够在保持分布内性能的稳健性和高样本效率的同时实现对未见任务的零次学习泛化。
URL
https://arxiv.org/abs/2502.01521