Paper Reading AI Learner

DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

2024-04-25 17:28:33
Tongzhou Mu, Minghua Liu, Hao Su

Abstract

The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks. See our project page (this https URL) for more details.

Abstract (translated)

许多强化学习(RL)技术的成功很大程度上依赖于人类设计的密集奖励,通常需要深厚的领域专业知识以及广泛的尝试和误差。在我们的工作中,我们提出了DrS(从Stages进行密集奖励学习),一种学习可重用密集奖励以数据驱动方式处理多阶段任务的新颖方法。通过利用任务的阶段结构,DrS从稀疏奖励和演示中学习高质量密集奖励。所学习的奖励可以在未见过的任务中\textit{重用},从而减轻了奖励工程的人力成本。在三个物理机器人操作任务家族(具有1000+任务变体)的广泛实验中,我们的学习奖励在未见过的任务中可以\textit{重用},从而提高了RL算法的表现和样本效率。甚至有些任务的学习奖励甚至可以达到与人类设计的奖励相媲美的水平。更多详情,请查看我们的项目页面(此https URL)。

URL

https://arxiv.org/abs/2404.16779

PDF

https://arxiv.org/pdf/2404.16779.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot