Paper Reading AI Learner

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

2023-05-24 18:14:35
Yue Wu, So Yeon Min, Shrimai Prabhumoye, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, Yuanzhi Li

Abstract

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.

Abstract (translated)

开放世界生存游戏对AI算法构成了巨大的挑战,因为它们需要同时处理多个任务、深入探索和目标优先级要求。尽管强化学习(RL)在解决游戏方面非常流行,但其高样本复杂度在复杂开放世界游戏(如crafter或Minecraft)中限制了其效果。我们提出了一种新颖的方法Spring,以阅读游戏的原始学术 paper,并通过使用大型语言模型(LLM)学习知识来推理并玩这个游戏。根据 LaTeX 源作为游戏上下文,并描述当前观察的Agent,我们的Spring框架使用一个具有游戏相关问题作为节点和依赖关系作为边的生成图。通过遍历生成图并计算每个节点的LLM响应,我们可以确定在环境中采取最佳行动的最佳方法,该行动直接转化为环境行动。在我们的实验中,我们研究了在crafter开放世界环境中不同形式提示引起的上下文“推理”质量。我们的实验表明,当持续思考一致序列时,LLM具有完成复杂高级轨迹的巨大潜力。定量上,Spring与GPT-4在无训练的情况下击败了训练了1000万步的最先进的RL基准模型。最后,我们展示了游戏作为LLM测试床的潜力。

URL

https://arxiv.org/abs/2305.15486

PDF

https://arxiv.org/pdf/2305.15486.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot