Paper Reading AI Learner

Exp-Graph: How Connections Learn Facial Attributes in Graph-based Expression Recognition

2025-07-19 13:10:21
Nandani Sharma, Dinesh Singh

Abstract

Facial expression recognition is crucial for human-computer interaction applications such as face animation, video surveillance, affective computing, medical analysis, etc. Since the structure of facial attributes varies with facial expressions, incorporating structural information into facial attributes is essential for facial expression recognition. In this paper, we propose Exp-Graph, a novel framework designed to represent the structural relationships among facial attributes using graph-based modeling for facial expression recognition. For facial attributes graph representation, facial landmarks are used as the graph's vertices. At the same time, the edges are determined based on the proximity of the facial landmark and the similarity of the local appearance of the facial attributes encoded using the vision transformer. Additionally, graph convolutional networks are utilized to capture and integrate these structural dependencies into the encoding of facial attributes, thereby enhancing the accuracy of expression recognition. Thus, Exp-Graph learns from the facial attribute graphs highly expressive semantic representations. On the other hand, the vision transformer and graph convolutional blocks help the framework exploit the local and global dependencies among the facial attributes that are essential for the recognition of facial expressions. We conducted comprehensive evaluations of the proposed Exp-Graph model on three benchmark datasets: Oulu-CASIA, eNTERFACE05, and AFEW. The model achieved recognition accuracies of 98.09\%, 79.01\%, and 56.39\%, respectively. These results indicate that Exp-Graph maintains strong generalization capabilities across both controlled laboratory settings and real-world, unconstrained environments, underscoring its effectiveness for practical facial expression recognition applications.

Abstract (translated)

面部表情识别对于人脸动画、视频监控、情感计算、医学分析等人机交互应用至关重要。由于面部特征结构随面部表情变化而改变,因此将结构信息融入到面部属性中对表情识别而言非常关键。在本文中,我们提出了Exp-Graph,这是一种新颖的框架,旨在利用基于图模型的方法来表示面部属性之间的结构关系,以进行面部表情识别。对于面部属性图表示,我们将面部特征点用作图的顶点;同时,边根据面部特征点的接近度以及使用视觉变换器编码的局部外观相似性确定。此外,还应用了图卷积网络(Graph Convolutional Networks, GCNs)来捕捉并整合这些结构依赖关系到面部属性的编码中,从而提高表情识别的准确性。因此,Exp-Graph从面部属性图中学到了高度表达性的语义表示。另一方面,视觉变换器和图卷积块帮助框架利用了面部属性之间的局部和全局依赖性,这对于面部表情识别至关重要。 我们在三个基准数据集上对提出的Exp-Graph模型进行了全面评估:Oulu-CASIA、eNTERFACE05 和 AFEW。该模型分别取得了98.09%、79.01% 和 56.39% 的识别准确率。这些结果表明,Exp-Graph 在受控实验室环境和现实世界的无约束环境中均保持了强大的泛化能力,这强调了其在实际面部表情识别应用中的有效性。

URL

https://arxiv.org/abs/2507.14608

PDF

https://arxiv.org/pdf/2507.14608.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot