Abstract
Facial expression recognition is crucial for human-computer interaction applications such as face animation, video surveillance, affective computing, medical analysis, etc. Since the structure of facial attributes varies with facial expressions, incorporating structural information into facial attributes is essential for facial expression recognition. In this paper, we propose Exp-Graph, a novel framework designed to represent the structural relationships among facial attributes using graph-based modeling for facial expression recognition. For facial attributes graph representation, facial landmarks are used as the graph's vertices. At the same time, the edges are determined based on the proximity of the facial landmark and the similarity of the local appearance of the facial attributes encoded using the vision transformer. Additionally, graph convolutional networks are utilized to capture and integrate these structural dependencies into the encoding of facial attributes, thereby enhancing the accuracy of expression recognition. Thus, Exp-Graph learns from the facial attribute graphs highly expressive semantic representations. On the other hand, the vision transformer and graph convolutional blocks help the framework exploit the local and global dependencies among the facial attributes that are essential for the recognition of facial expressions. We conducted comprehensive evaluations of the proposed Exp-Graph model on three benchmark datasets: Oulu-CASIA, eNTERFACE05, and AFEW. The model achieved recognition accuracies of 98.09\%, 79.01\%, and 56.39\%, respectively. These results indicate that Exp-Graph maintains strong generalization capabilities across both controlled laboratory settings and real-world, unconstrained environments, underscoring its effectiveness for practical facial expression recognition applications.
Abstract (translated)
面部表情识别对于人脸动画、视频监控、情感计算、医学分析等人机交互应用至关重要。由于面部特征结构随面部表情变化而改变,因此将结构信息融入到面部属性中对表情识别而言非常关键。在本文中,我们提出了Exp-Graph,这是一种新颖的框架,旨在利用基于图模型的方法来表示面部属性之间的结构关系,以进行面部表情识别。对于面部属性图表示,我们将面部特征点用作图的顶点;同时,边根据面部特征点的接近度以及使用视觉变换器编码的局部外观相似性确定。此外,还应用了图卷积网络(Graph Convolutional Networks, GCNs)来捕捉并整合这些结构依赖关系到面部属性的编码中,从而提高表情识别的准确性。因此,Exp-Graph从面部属性图中学到了高度表达性的语义表示。另一方面,视觉变换器和图卷积块帮助框架利用了面部属性之间的局部和全局依赖性,这对于面部表情识别至关重要。 我们在三个基准数据集上对提出的Exp-Graph模型进行了全面评估:Oulu-CASIA、eNTERFACE05 和 AFEW。该模型分别取得了98.09%、79.01% 和 56.39% 的识别准确率。这些结果表明,Exp-Graph 在受控实验室环境和现实世界的无约束环境中均保持了强大的泛化能力,这强调了其在实际面部表情识别应用中的有效性。
URL
https://arxiv.org/abs/2507.14608