Paper Reading AI Learner

Gaze estimation learning architecture as support to affective, social and cognitive studies in natural human-robot interaction

2024-10-25 08:21:48
Maria Lombardi, Elisa Maiettini, Agnieszka Wykowska, Lorenzo Natale

Abstract

Gaze is a crucial social cue in any interacting scenario and drives many mechanisms of social cognition (joint and shared attention, predicting human intention, coordination tasks). Gaze direction is an indication of social and emotional functions affecting the way the emotions are perceived. Evidence shows that embodied humanoid robots endowing social abilities can be seen as sophisticated stimuli to unravel many mechanisms of human social cognition while increasing engagement and ecological validity. In this context, building a robotic perception system to automatically estimate the human gaze only relying on robot's sensors is still demanding. Main goal of the paper is to propose a learning robotic architecture estimating the human gaze direction in table-top scenarios without any external hardware. Table-top tasks are largely used in many studies in experimental psychology because they are suitable to implement numerous scenarios allowing agents to collaborate while maintaining a face-to-face interaction. Such an architecture can provide a valuable support in studies where external hardware might represent an obstacle to spontaneous human behaviour, especially in environments less controlled than the laboratory (e.g., in clinical settings). A novel dataset was also collected with the humanoid robot iCub, including images annotated from 24 participants in different gaze conditions.

Abstract (translated)

注视是任何交互场景中至关重要的社会线索,驱动了许多社会认知机制(共同注意和共享注意力、预测人类意图、协调任务)。注视方向是对社会和情感功能的指示,影响情绪感知的方式。证据表明,赋予社交能力的身体化类人机器人可以被视为复杂刺激物,以揭示许多人类社会认知机制,同时增加参与度和生态有效性。在此背景下,构建一个仅依赖于机器人传感器自动估算人类注视的机器人感知系统仍然是一项挑战。本文的主要目标是提出一种学习型机器人架构,在桌面上的情境中无需任何外部硬件就能估计人类的注视方向。桌面任务在许多实验心理学研究中广泛使用,因为它们适合实施多种场景,允许代理在保持面对面互动的同时进行协作。这种架构可以在研究中提供有价值的帮助,尤其是在实验室以外、控制程度较低的环境中(如临床环境),外部硬件可能成为自发行为的障碍。此外,还用类人机器人iCub收集了一组新的数据集,包括从24名参与者在不同注视条件下注释的图像。

URL

https://arxiv.org/abs/2410.19374

PDF

https://arxiv.org/pdf/2410.19374.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot