Abstract
Large language models (LLMs) have gained increasing popularity in robotic task planning due to their exceptional abilities in text analytics and generation, as well as their broad knowledge of the world. However, they fall short in decoding visual cues. LLMs have limited direct perception of the world, which leads to a deficient grasp of the current state of the world. By contrast, the emergence of visual language models (VLMs) fills this gap by integrating visual perception modules, which can enhance the autonomy of robotic task planning. Despite these advancements, VLMs still face challenges, such as the potential for task execution errors, even when provided with accurate instructions. To address such issues, this paper proposes a ReplanVLM framework for robotic task planning. In this study, we focus on error correction interventions. An internal error correction mechanism and an external error correction mechanism are presented to correct errors under corresponding phases. A replan strategy is developed to replan tasks or correct error codes when task execution fails. Experimental results on real robots and in simulation environments have demonstrated the superiority of the proposed framework, with higher success rates and robust error correction capabilities in open-world tasks. Videos of our experiments are available at this https URL.
Abstract (translated)
大型语言模型(LLMs)因其在文本分析和生成方面的卓越能力以及其对世界的广泛知识而在机器人任务规划中获得了越来越多的关注。然而,它们在解读视觉线索方面存在不足。LLMs对世界的直接感知有限,导致对当前世界状态的掌握不足。相比之下,视觉语言模型(VLMs)通过整合视觉感知模块,提高了机器人在任务规划中的自主性。尽管如此,VLMs仍然面临一些挑战,例如即使提供准确的指令,也可能会出现任务执行错误的情况。为了应对这些问题,本文提出了一个用于机器人任务规划的重新规划VLM框架。在本研究中,我们重点关注错误纠正干预。我们提出了一个内部错误纠正机制和一个外部错误纠正机制来纠正相应阶段下的错误。还开发了一种重新规划策略,用于在任务执行失败时重新规划任务或更正错误码。在现实机器人和模拟环境中进行实验证明,与所提出的框架相比,具有更高的成功率和在开放世界任务中更强的错误纠正能力。您可以在该链接中查看我们实验的视频:https://www.youtube.com/watch?v= 。
URL
https://arxiv.org/abs/2407.21762