Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity

Abstract
Abstract (translated)
URL
PDF

Abstract

We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning,Vision-Language models for semantic perception, and Point Cloud transformers for grasping. With semantic and physical safety in mind, these modules are interfaced with a real-time trajectory optimizer and a compliant tracking controller to enable human-robot proximity. We demonstrate performance for the following tasks: bi-arm sorting, bottle opening, and trash disposal tasks. These are done zero-shot where the models used have not been trained with any real world data from this bi-arm robot, scenes or workspace.Composing both learning- and non-learning-based components in a modular fashion with interpretable inputs and outputs allows the user to easily debug points of failures and fragilities. One may also in-place swap modules to improve the robustness of the overall platform, for instance with imitation-learned policies.

Abstract (translated)

我们提出了一个基于 embodied AI 系统的多模块体系结构，该系统接受人类提供的开放性自然语言指令，并控制两个臂协同完成可能涉及较长时间间隔的大型工作空间的任务。我们的系统具有模块化特性：它采用了最先进的自然语言处理模型进行任务规划，视觉语言模型进行语义感知，以及点云变换器进行抓取。在考虑语义和物理安全性的前提下，这些模块通过与实时轨迹优化器和符合跟踪控制器的接口相连，实现了与机器人的人机亲近。我们展示了以下任务的性能：双臂分类、开瓶和垃圾处理任务。这些任务是在没有训练过任何现实世界的数据的情况下完成的，包括这个双臂机器人、场景或工作空间。以模块化方式，将学习和非学习组件组合在一起，并提供可解释的输入和输出，使用户能够轻松地诊断故障和脆弱点。还可以在本地交换模块以提高整个平台的健壮性，例如通过模仿学习策略。

URL

https://arxiv.org/abs/2404.03570

PDF

https://arxiv.org/pdf/2404.03570.pdf

Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity

Abstract

Abstract (translated)

URL

PDF Copy

PDF