Paper Reading AI Learner

Communicating human intent to a robotic companion by multi-type gesture sentences

2023-03-08 09:02:12
Petr Vanc, Jan Kristof Behrens, Karla Stepanova, Vaclav Hlavac

Abstract

Human-Robot collaboration in home and industrial workspaces is on the rise. However, the communication between robots and humans is a bottleneck. Although people use a combination of different types of gestures to complement speech, only a few robotic systems utilize gestures for communication. In this paper, we propose a gesture pseudo-language and show how multiple types of gestures can be combined to express human intent to a robot (i.e., expressing both the desired action and its parameters - e.g., pointing to an object and showing that the object should be emptied into a bowl). The demonstrated gestures and the perceived table-top scene (object poses detected by CosyPose) are processed in real-time) to extract the human's intent. We utilize behavior trees to generate reactive robot behavior that handles various possible states of the world (e.g., a drawer has to be opened before an object is placed into it) and recovers from errors (e.g., when the scene changes). Furthermore, our system enables switching between direct teleoperation of the end-effector and high-level operation using the proposed gesture sentences. The system is evaluated on increasingly complex tasks using a real 7-DoF Franka Emika Panda manipulator. Controlling the robot via action gestures lowered the execution time by up to 60%, compared to direct teleoperation.

Abstract (translated)

人类和机器人在家庭和工业工作空间中的协作正在增加。然而,机器人和人类的通信仍然是一个瓶颈。尽管人们使用各种不同类型的手势来补充语言,但只有少量的机器人系统才使用手势来进行通信。在本文中,我们提出了手势伪语言,并展示了如何结合多种类型的手势来表达人类对机器人的意图(即表达所需的行动及其参数——例如,指向一个物体并显示它应该倒入碗里面)。演示的手势和感知的桌面场景(由CosyPose检测的物体姿势)在实时环境中进行处理,以提取人类的意图。我们利用行为树生成响应机器人行为,处理世界上各种可能的状态(例如,在物体放进去之前必须打开抽屉)并恢复错误(例如,当场景发生变化时)。此外,我们的系统使用 proposed 手势语句来实现间接远程控制和高级操作,使用真实的7自由度Franka Emika Panda操纵器对越来越复杂的任务进行评估。通过使用行动手势来控制机器人,系统的执行时间相比直接远程控制降低了高达60%。

URL

https://arxiv.org/abs/2303.04451

PDF

https://arxiv.org/pdf/2303.04451.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot