Paper Reading AI Learner

QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

2024-11-04 11:20:17
Chengpeng Wang, Li Chen, Lili Wang, Zhaofan Li, Xuebin Lv

Abstract

On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address this. We introduce the Cross Similarity Attention (CSA), an input-output position-sensitive attention mechanism that harnesses feature similarity across different images to compute the corresponding global spatial attention. Based on this, we propose a four-branch circular framework, called Quadruplet Cross Similarity (QCS), to extract discriminative features from the same class and eliminate redundant ones from different classes synchronously to refine cleaner features. The symmetry of the network ensures balanced and stable training and reduces the amount of CSA interaction matrix. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. The cross-attention module exists during training, and only one base branch is retained during inference. our proposed QCS model outperforms state-of-the-art methods on several popular FER datasets, without requiring additional landmark information or other extra training data. The code is available at this https URL.

Abstract (translated)

在面部表情数据集中,由于特征类型复杂且数量众多,标注特征的重要性和主导性难以预测,面部表情识别(FER)面临着类间相似性和类内变化的挑战,这使得有效特征的挖掘变得困难。我们的目标是仅利用面部样本之间的特征相似性来解决这一问题。我们引入了交叉相似度注意力机制(CSA),这是一种输入-输出位置敏感的注意机制,它利用不同图像之间的特征相似性来计算相应的全局空间注意。基于此,我们提出了一种四分支循环框架——称为四重交叉相似度(QCS)——用于从同一类中提取区分性特征,并同步消除来自不同类别的冗余特征以精炼更清晰的特征。网络的对称性确保了训练的平衡和稳定性,并减少了CSA交互矩阵的数量。使用对比残差蒸馏将跨模块中学到的信息传递回基础网络。交叉注意力模块在训练过程中存在,而在推理时仅保留一个基本分支。我们提出的QCS模型在多个流行的FER数据集上优于最先进的方法,且无需额外的地标信息或其他额外的训练数据。代码可在以下链接获取:[此 https URL]。

URL

https://arxiv.org/abs/2411.01988

PDF

https://arxiv.org/pdf/2411.01988.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot