Abstract
The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.
Abstract (translated)
在以人类为中心的环境中(如仓库)中,服务机器人的日益增多要求无缝且直观的人机协作。本文提出了一种结合多模态交互、基于物理推理和任务分工的货架拣选合作框架,旨在增强人机团队工作的效果。该框架使机器人能够识别人类的手势指示,理解口头提示及语音命令,并通过视觉和听觉反馈与人进行交流。 此外,该框架还采用了一个大型语言模型(LLM),利用链式思维(CoT)和基于物理的仿真引擎来安全地从货架上拾取堆积混乱的箱子堆。它还包括一个子任务生成的关系图、提取序列规划以及决策制定机制。最后,我们通过一系列现实世界中的货架拣选实验验证了该框架的有效性,这些实验包括:1) 手势引导的盒子提取;2) 协作式清空货架;3) 协作式的稳定性辅助。
URL
https://arxiv.org/abs/2504.06593