Abstract
Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g. 'Sort the objects from lightest to heaviest'). In order to facilitate the development of such systems we introduce a new simulating environment that makes use of MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. Together with the simulator we propose a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements. Finally, we develop a new modular Closed Loop Interactive Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. We extensively evaluate our reasoning approach in simulation and in the real world manipulation tasks with a success rate above 76% and 64%, respectively.
Abstract (translated)
人体推理系统将机器人硬件和认知过程集成在一起,以应对自然语言查询关于特定物理环境中复杂任务的执行。这通常包括改变场景的信念或通过身体交互来改变场景(例如,'将物体按重量从轻到重排序')。为了促进这种系统的发展,我们引入了一个新的模拟环境,利用MuJoCo物理引擎和高质量渲染器Blender提供真实的视觉观察,并且准确地反映场景的物理状态。与模拟器一起,我们提出了一个由10个多步骤推理场景组成的新的基准。最后,我们开发了一种新型的模块化闭合循环交互推理(CLIER)方法,考虑了非视觉对象属性的测量、由外部干扰引起的场景变化以及机器人行动不确定性的结果。我们在模拟和现实世界的操作任务中对其推理方法进行了广泛评估。我们在模拟和现实世界的操作任务中的成功率分别达到76%和64%。
URL
https://arxiv.org/abs/2404.15194