Paper Reading AI Learner

Retrieval-Augmented Embodied Agents

2024-04-17 18:57:48
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

Abstract

Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology.

Abstract (translated)

操作于复杂且不确定的环境中的嵌入式智能体面临着巨大的挑战。虽然一些先进的智能体通过熟练处理复杂操作任务而表现出色,但他们的成功往往取决于广泛的训练数据来发展其能力。相比之下,人类通常依赖于回忆过去的经验和类似的情况来解决问题。为了在机器人领域模仿人类方法,我们引入了 Retrieval-Augmented Embodied Agent(RAEA)系统。这种创新系统使机器人具备了一种共享记忆形式,显著提高了其性能。我们的方法结合了策略检索器,使机器人在基于多模态输入的外部策略记忆库中访问相关策略。此外,我们还使用策略生成器将这些策略纳入学习过程,使机器人能够对任务形成有效的响应。对RAEA在模拟和现实世界场景的广泛测试表明,其性能超过了传统方法,代表机器人技术取得了重大进展。

URL

https://arxiv.org/abs/2404.11699

PDF

https://arxiv.org/pdf/2404.11699.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot