Abstract
Recent advancements in multimodal Human-Robot Interaction (HRI) datasets have highlighted the fusion of speech and gesture, expanding robots' capabilities to absorb explicit and implicit HRI insights. However, existing speech-gesture HRI datasets often focus on elementary tasks, like object pointing and pushing, revealing limitations in scaling to intricate domains and prioritizing human command data over robot behavior records. To bridge these gaps, we introduce NatSGD, a multimodal HRI dataset encompassing human commands through speech and gestures that are natural, synchronized with robot behavior demonstrations. NatSGD serves as a foundational resource at the intersection of machine learning and HRI research, and we demonstrate its effectiveness in training robots to understand tasks through multimodal human commands, emphasizing the significance of jointly considering speech and gestures. We have released our dataset, simulator, and code to facilitate future research in human-robot interaction system learning; access these resources at this https URL
Abstract (translated)
近年来,在多模态人机交互(HRI)数据集的先进发展上,我们强调了语音和手势的融合,将机器的能力扩展到吸收明确和隐含的人机交互(HRI)洞察。然而,现有的语音-手势HRI数据集通常集中于基本任务,如物体指点和推动,揭示了在扩展到复杂领域时优先考虑人类命令数据而忽视机器人行为记录的局限性。为了弥合这些空白,我们引入了NatSGD,一个多模态HRI数据集,涵盖人类通过语音和手势进行的自然命令,并与机器人行为演示同步。NatSGD在机器学习和HRI研究的交叉点处作为基础资源,我们证明了通过多模态人类命令训练机器理解任务的有效性,强调了同时考虑语音和手势的意义。我们已经发布了我们的数据集、模拟器和代码,以促进未来人机交互系统学习的研究;访问这些资源在此链接处。
URL
https://arxiv.org/abs/2403.02274