Paper Reading AI Learner

Voyager: An Open-Ended Embodied Agent with Large Language Models

2023-05-25 17:46:38
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar


We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at this https URL.

Abstract (translated)

我们介绍了Voyager,它是Minecraft中第一个使用LLM技术 embodied 的 lifelong learningagent,能够持续探索世界、获取多样化的技能,并且在没有人类干预的情况下发现新的事物。Voyager由三个关键组件组成:1) 最大化探索的自动课程,2) 不断增长的技能库,用于存储和检索复杂的行为,3) 新的迭代prompt机制,其中包括环境反馈、执行错误和自我验证,用于改进程序。Voyager通过黑盒查询与GPT-4交互,无需调整模型参数即可绕过了模型参数微调的需求。由Voyager开发的技能具有时间上的扩展、可解释性和组合性,这迅速增强了agent的能力,并减轻了灾难性遗忘。实证研究表明,Voyager表现出Context-based in-game lifelong learning能力,并在Minecraft游戏中表现出非凡的技能水平。它获得了3.3倍的更独特物品、旅行2.3倍的更长距离,并且比先前的SOTA更快地解锁关键科技树里程碑。Voyager能够利用已学习的技能库在一个新的Minecraft世界中解决全新的任务,而其他技术则难以通用。我们公开了我们完整的代码库和prompts,这些prompts位于这个httpsURL上。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot