Paper Reading AI Learner

Rank2Reward: Learning Shaped Reward Functions from Passive Video

2024-04-23 04:31:30
Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta

Abstract

Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both "what" to do and "how" to do it. A powerful way to encode both the "what" and the "how" is to infer a well-shaped reward function for reinforcement learning. The challenge is determining how to ground visual demonstration inputs into a well-shaped and informative reward function. We propose a technique Rank2Reward for learning behaviors from videos of tasks being performed without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental "progress" through a task by learning how to temporally rank the video frames in a demonstration. By inferring an appropriate ranking, the reward function is able to guide reinforcement learning by indicating when task progress is being made. This ranking function can be integrated into an adversarial imitation learning scheme resulting in an algorithm that can learn behaviors without exploiting the learned reward function. We demonstrate the effectiveness of Rank2Reward at learning behaviors from raw video on a number of tabletop manipulation tasks in both simulations and on a real-world robotic arm. We also demonstrate how Rank2Reward can be easily extended to be applicable to web-scale video datasets.

Abstract (translated)

通过使用人机交互数据收集技术(如本体感知教学或遥控操作)进行示例教学,让机器人学习 novel skills 会为人类监督者带来沉重的负担。相比之下,提供原始、无动作的任务执行数据要容易得多。此外,这种数据甚至可以从视频数据集中或互联网上进行挖掘。在理想情况下,这些数据可以为机器人提供关于新任务和新环境下的机器学习指导,告知做什么以及如何做。通过推断一个形状良好且信息丰富的奖励函数来编码 both the "what" 和 the "how" 是一种强大的方法。挑战在于将视觉演示输入 ground 到一个形状良好且具有指导性的奖励函数中。我们提出了 Rank2Reward 技术,用于从执行任务的视频序列中学习行为,而无需访问任何低级状态和动作。我们通过学习如何按时间排序演示视频帧来推断适当的分级,从而使奖励函数能够指导强化学习,表明任务进展。这个排名函数可以集成到 adversarial imitation learning 方案中,从而学习行为而无需利用所学习的奖励函数。我们证明了 Rank2Reward 在从模拟和现实世界机器人手臂的许多表单操作任务中学习行为方面的有效性。我们还展示了 Rank2Reward 如何很容易地扩展到适用于网页规模视频数据集。

URL

https://arxiv.org/abs/2404.14735

PDF

https://arxiv.org/pdf/2404.14735.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot