Abstract
The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks. See our project page (this https URL) for more details.
Abstract (translated)
许多强化学习(RL)技术的成功很大程度上依赖于人类设计的密集奖励,通常需要深厚的领域专业知识以及广泛的尝试和误差。在我们的工作中,我们提出了DrS(从Stages进行密集奖励学习),一种学习可重用密集奖励以数据驱动方式处理多阶段任务的新颖方法。通过利用任务的阶段结构,DrS从稀疏奖励和演示中学习高质量密集奖励。所学习的奖励可以在未见过的任务中\textit{重用},从而减轻了奖励工程的人力成本。在三个物理机器人操作任务家族(具有1000+任务变体)的广泛实验中,我们的学习奖励在未见过的任务中可以\textit{重用},从而提高了RL算法的表现和样本效率。甚至有些任务的学习奖励甚至可以达到与人类设计的奖励相媲美的水平。更多详情,请查看我们的项目页面(此https URL)。
URL
https://arxiv.org/abs/2404.16779