Abstract
The integration of robotics and augmented reality (AR) holds transformative potential for advancing human-robot interaction (HRI), offering enhancements in usability, intuitiveness, accessibility, and collaborative task performance. This paper introduces and evaluates a novel multimodal AR-based robot puppeteer framework that enables intuitive teleoperation via virtual counterpart through large language model (LLM)-driven voice commands and hand gesture interactions. Utilizing the Meta Quest 3, users interact with a virtual counterpart robot in real-time, effectively "puppeteering" its physical counterpart within an AR environment. We conducted a within-subject user study with 42 participants performing robotic cube pick-and-place with pattern matching tasks under two conditions: gesture-only interaction and combined voice-and-gesture interaction. Both objective performance metrics and subjective user experience (UX) measures were assessed, including an extended comparative analysis between roboticists and non-roboticists. The results provide key insights into how multimodal input influences contextual task efficiency, usability, and user satisfaction in AR-based HRI. Our findings offer practical design implications for designing effective AR-enhanced HRI systems.
Abstract (translated)
机器人技术和增强现实(AR)的融合在推进人机交互(HRI)方面具有变革潜力,为可用性、直观性、可访问性和协作任务表现提供了改进。本文介绍并评估了一种新颖的基于多模态AR的机器人木偶师框架,该框架通过大型语言模型(LLM)驱动的声音命令和手部手势互动实现直观的远程操作。利用Meta Quest 3设备,用户可以在实时环境中与虚拟机器人的对应体进行交互,从而在AR环境下有效“操控”其实体对应物。 我们进行了一个涉及42名参与者的同组用户研究,在两种条件下测试了机器人立方体抓取和放置任务(包括模式匹配):仅手势互动和结合声音及手势互动。评估既包括客观性能指标也包括主观用户体验(UX)测量,还进行了一项扩展的比较分析,对比了机器人专家与非机器人专家之间的差异。 结果提供了关于多模态输入如何影响AR环境中的上下文任务效率、可用性和用户满意度的重要见解。我们的研究发现为设计有效的增强现实人机交互系统提出了实用的设计建议。
URL
https://arxiv.org/abs/2506.13189