Paper Reading AI Learner

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

2024-05-02 17:59:31
Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot, Ruslan Salakhutdinov

Abstract

Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at this https URL

Abstract (translated)

大语言模型(LLMs)已经被证明在长时间的机器人任务中具有执行高级计划的能力。然而,现有的方法需要访问预定义的技能库(例如抓取、放置、拖动、推开、导航)。然而,LLM计划并没有解决如何设计或学习这些行为,这使得在长时间设置中解决这个问题变得更加具有挑战性。此外,对于许多感兴趣的任务,机器人需要能够以细粒度的方式调整其行为,要求代理具备修改低级控制动作的能力。我们可以 instead 使用LLM在高级策略上进行知识表示,指导强化学习(RL)策略有效地解决机器人控制任务,而无需预先确定一组技能?在本文中,我们提出了Plan-Seq-Learn(PSL):一种模块化方法,使用运动规划来桥接抽象语言和学习的低级控制,以从零开始解决长时间的机器人任务。我们证明了PSL在超过25个具有挑战性的机器人任务上取得了最先进的成果,其中包括10个阶段。PSL通过从成功的视觉输入中解决长期机器人任务,其成功率超过85%,超过了基于语言的传统方法、基于任务的端到端方法和基于知识的方法。视频结果和代码在此处:https://url

URL

https://arxiv.org/abs/2405.01534

PDF

https://arxiv.org/pdf/2405.01534.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot