Paper Reading AI Learner

Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition

2024-05-16 09:34:57
Yuchen Zhou, Linkai Liu, Chao Gou

Abstract

Most existing attention prediction research focuses on salient instances like humans and objects. However, the more complex interaction-oriented attention, arising from the comprehension of interactions between instances by human observers, remains largely unexplored. This is equally crucial for advancing human-machine interaction and human-centered artificial intelligence. To bridge this gap, we first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories, capturing visual attention during human observers cognitive processes of interactions. Subsequently, we introduce the zero-shot interaction-oriented attention prediction task ZeroIA, which challenges models to predict visual cues for interactions not encountered during training. Thirdly, we present the Interactive Attention model IA, designed to emulate human observers cognitive processes to tackle the ZeroIA problem. Extensive experiments demonstrate that the proposed IA outperforms other state-of-the-art approaches in both ZeroIA and fully supervised settings. Lastly, we endeavor to apply interaction-oriented attention to the interaction recognition task itself. Further experimental results demonstrate the promising potential to enhance the performance and interpretability of existing state-of-the-art HOI models by incorporating real human attention data from IG and attention labels generated by IA.

Abstract (translated)

目前,大部分现有的注意力预测研究都集中在显眼的实例,如人类和物体。然而,更加复杂的关系型注意力,即通过人类观察者理解实例之间互动所产生的注意力,仍然没有被深入研究。这对于促进人与机器之间的交互和人类为中心的人工智能发展至关重要。为了填补这一空白,我们首先收集了一个名为IG的新 gaze 固定点数据集,包括740个不同交互类别的530,000个固定点,记录了人类观察者在互动过程中的视觉注意力。接着,我们引入了零击关系型注意力预测任务ZeroIA,该任务挑战模型预测在训练过程中未见过的视觉线索。第三,我们提出了交互注意力模型IA,旨在模仿人类观察者的认知过程解决 ZeroIA 问题。大量实验证明,与最先进的零击和完全监督方法相比,所提出的IA在ZeroIA和完全监督设置中都表现出色。最后,我们努力将关系型注意力应用于交互识别任务本身。进一步的实验结果表明,通过将IG和IA生成的真实人类注意力数据以及注意力标签相结合,可以增强现有最先进的HOI模型的性能和可解释性。

URL

https://arxiv.org/abs/2405.09931

PDF

https://arxiv.org/pdf/2405.09931.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot