Paper Reading AI Learner

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

2024-05-06 11:33:12
Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su

Abstract

Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics. Our approach consists of a reverse curriculum followed by a forward curriculum. Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets. The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems. A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control.

Abstract (translated)

强化学习(RL)通过环境交互来学习策略是一个有前途的框架,但通常需要大量的交互数据来解决复杂任务。一方面包括通过离线数据增强 RL,以展示所需任务的强化学习,但先前的研究表明,通常需要大量高质量的演示数据,这很难获得,尤其是在机器人领域。我们的方法包括一个反向课程和一个正向课程。与先前的研究相比,我们方法的独特之处在于能够通过通过状态重置生成的每个演示的逆向课程来有效地利用多个演示。反向课程的结果是一个在窄的初始状态分布上表现良好的初始策略,并帮助克服困难的探索问题。然后使用正向课程来加速初始策略的训练,以在任务的全初始状态分布上表现良好,并提高演示和采样效率。我们证明了,在我们的方法 RFCL 中,结合反向课程和正向课程,能够显著提高演示和采样效率,与各种从演示中学习的基线相比,甚至解决了以前无法解决的任务,这些任务需要高精度和控制。

URL

https://arxiv.org/abs/2405.03379

PDF

https://arxiv.org/pdf/2405.03379.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot