SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

Abstract
Abstract (translated)
URL
PDF

Abstract

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.

Abstract (translated)

开放世界生存游戏对AI算法构成了巨大的挑战，因为它们需要同时处理多个任务、深入探索和目标优先级要求。尽管强化学习(RL)在解决游戏方面非常流行，但其高样本复杂度在复杂开放世界游戏(如crafter或Minecraft)中限制了其效果。我们提出了一种新颖的方法Spring，以阅读游戏的原始学术 paper，并通过使用大型语言模型(LLM)学习知识来推理并玩这个游戏。根据 LaTeX 源作为游戏上下文，并描述当前观察的Agent，我们的Spring框架使用一个具有游戏相关问题作为节点和依赖关系作为边的生成图。通过遍历生成图并计算每个节点的LLM响应，我们可以确定在环境中采取最佳行动的最佳方法，该行动直接转化为环境行动。在我们的实验中，我们研究了在crafter开放世界环境中不同形式提示引起的上下文“推理”质量。我们的实验表明，当持续思考一致序列时，LLM具有完成复杂高级轨迹的巨大潜力。定量上，Spring与GPT-4在无训练的情况下击败了训练了1000万步的最先进的RL基准模型。最后，我们展示了游戏作为LLM测试床的潜力。

URL

https://arxiv.org/abs/2305.15486

PDF

https://arxiv.org/pdf/2305.15486.pdf