Paper Reading AI Learner

iCub Detecting Gazed Objects: A Pipeline Estimating Human Attention

2023-08-25 11:45:07
Shiva Hanifi, Elisa Maiettini, Maria Lombardi, Lorenzo Natale

Abstract

This paper explores the role of eye gaze in human-robot interactions and proposes a novel system for detecting objects gazed by the human using solely visual feedback. The system leverages on face detection, human attention prediction, and online object detection, and it allows the robot to perceive and interpret human gaze accurately, paving the way for establishing joint attention with human partners. Additionally, a novel dataset collected with the humanoid robot iCub is introduced, comprising over 22,000 images from ten participants gazing at different annotated objects. This dataset serves as a benchmark for evaluating the performance of the proposed pipeline. The paper also includes an experimental analysis of the pipeline's effectiveness in a human-robot interaction setting, examining the performance of each component. Furthermore, the developed system is deployed on the humanoid robot iCub, and a supplementary video showcases its functionality. The results demonstrate the potential of the proposed approach to enhance social awareness and responsiveness in social robotics, as well as improve assistance and support in collaborative scenarios, promoting efficient human-robot collaboration. The code and the collected dataset will be released upon acceptance.

Abstract (translated)

本论文探讨了人类-机器人交互中的目光注视作用,并提出了一种新型的系统,仅使用视觉反馈来检测人类注视的对象。该系统利用面部检测、人类注意力预测和在线对象检测技术,使机器人能够准确感知和解释人类注视的对象,为与人类合作伙伴建立联合关注铺平了道路。此外,与直立型机器人iCub一起收集了一个新的数据集,其中包括超过22,000张由十名参与者注视不同注释对象的图像。这个数据集作为评估 proposed 管道性能的标准。本文还包含了在人类-机器人交互环境中实验分析管道有效性的文献,考察了每个组件的性能。此外,该系统被部署在直立型机器人 iCub 上,一个补充视频展示了其功能。结果表明,该方法的潜力在于增强社会机器人中的社会意识和响应能力,改进协作场景中的协助和支持,促进高效的人类-机器人协作。代码和收集的数据集将在接受后发布。

URL

https://arxiv.org/abs/2308.13318

PDF

https://arxiv.org/pdf/2308.13318.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot