Paper Reading AI Learner

AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification

2025-04-07 12:52:23
Wang Tang, Fethiye Irmak Dogan, Linbo Qing, Hatice Gunes

Abstract

Dyadic social relationships, which refer to relationships between two individuals who know each other through repeated interactions (or not), are shaped by shared spatial and temporal experiences. Current computational methods for modeling these relationships face three major challenges: (1) the failure to model asymmetric relationships, e.g., one individual may perceive the other as a friend while the other perceives them as an acquaintance, (2) the disruption of continuous interactions by discrete frame sampling, which segments the temporal continuity of interaction in real-world scenarios, and (3) the limitation to consider periodic behavioral cues, such as rhythmic vocalizations or recurrent gestures, which are crucial for inferring the evolution of dyadic relationships. To address these challenges, we propose AsyReC, a multimodal graph-based framework for asymmetric dyadic relationship classification, with three core innovations: (i) a triplet graph neural network with node-edge dual attention that dynamically weights multimodal cues to capture interaction asymmetries (addressing challenge 1); (ii) a clip-level relationship learning architecture that preserves temporal continuity, enabling fine-grained modeling of real-world interaction dynamics (addressing challenge 2); and (iii) a periodic temporal encoder that projects time indices onto sine/cosine waveforms to model recurrent behavioral patterns (addressing challenge 3). Extensive experiments on two public datasets demonstrate state-of-the-art performance, while ablation studies validate the critical role of asymmetric interaction modeling and periodic temporal encoding in improving the robustness of dyadic relationship classification in real-world scenarios. Our code is publicly available at: this https URL.

Abstract (translated)

二元社会关系指的是通过反复互动(或没有)相互了解的两个人之间的关系,这种关系受到共同的空间和时间经历的影响。目前用于建模这些关系的计算方法面临三大挑战:(1) 无法模拟不对称关系,例如一个人可能认为另一个人是朋友,而对方则只将其视为普通认识的人;(2) 离散帧采样中断了真实场景中互动的连续性,这破坏了互动时间上的连贯性;(3) 只能考虑周期性的行为线索,如节奏化的发声或重复的手势,这对于推断二元关系的发展至关重要。为了应对这些挑战,我们提出了AsyReC框架,这是一个基于多模态图的方法用于不对称二元关系分类,并包含三个核心创新:(i) 一种带有节点-边双重注意力机制的三元组图神经网络,该网络能够动态加权多模态线索以捕捉互动中的不对称性(解决挑战1);(ii) 一个片段级别的关系学习架构,保持时间连续性,使真实世界互动动力学的精细化建模成为可能(解决挑战2);以及(iii) 周期性时间编码器,将时间索引投影到正弦/余弦波形上以模拟重复的行为模式(解决挑战3)。在两个公开数据集上的大量实验表明了其卓越性能,并且消融研究验证了不对称互动建模和周期性时间编码对改善现实场景中二元关系分类的鲁棒性的关键作用。我们的代码可以在以下网址找到:this https URL.

URL

https://arxiv.org/abs/2504.05030

PDF

https://arxiv.org/pdf/2504.05030.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot