Abstract
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs). Previous works typically train a planner to imitate expert trajectories, treating this as a supervised task. While these methods achieve competitive performance, they often lack sufficient robustness. When a suboptimal action is taken, the planner may encounter an out-of-distribution state, which can lead to task failure. In contrast, we frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption. Thus, we propose a closed-loop planner with an adaptation module and a novel hindsight method, aiming to use as much information as possible to assist the planner. Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption. For the first time, our few-shot agent's performance approaches and even surpasses that of the full-shot supervised agent.
Abstract (translated)
这项工作致力于使用大型语言模型(LLMs)为具身指令遵循(EIF)构建任务规划器。先前的研究通常训练一个模仿专家轨迹的规划器,将其视为监督学习任务。尽管这些方法能够达到竞争力的表现,但它们常常缺乏足够的鲁棒性。当采取次优行动时,规划器可能会遇到分布外的状态,这可能导致任务失败。相反,我们将任务建模为部分可观测马尔可夫决策过程(POMDP),并旨在开发一种在少量样本假设下的稳健规划器。因此,我们提出了一种闭环规划器,配备有适应模块和一种新颖的回顾方法,以尽可能利用信息来辅助规划器。我们在ALFRED数据集上的实验表明,在少量样本假设下,我们的规划器实现了竞争性表现。首次,我们的少量样本代理的表现接近甚至超过了完全监督下的全量样本代理。
URL
https://arxiv.org/abs/2412.19562