Abstract
On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address this. We introduce the Cross Similarity Attention (CSA), an input-output position-sensitive attention mechanism that harnesses feature similarity across different images to compute the corresponding global spatial attention. Based on this, we propose a four-branch circular framework, called Quadruplet Cross Similarity (QCS), to extract discriminative features from the same class and eliminate redundant ones from different classes synchronously to refine cleaner features. The symmetry of the network ensures balanced and stable training and reduces the amount of CSA interaction matrix. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. The cross-attention module exists during training, and only one base branch is retained during inference. our proposed QCS model outperforms state-of-the-art methods on several popular FER datasets, without requiring additional landmark information or other extra training data. The code is available at this https URL.
Abstract (translated)
在面部表情数据集中,由于特征类型复杂且数量众多,标注特征的重要性和主导性难以预测,面部表情识别(FER)面临着类间相似性和类内变化的挑战,这使得有效特征的挖掘变得困难。我们的目标是仅利用面部样本之间的特征相似性来解决这一问题。我们引入了交叉相似度注意力机制(CSA),这是一种输入-输出位置敏感的注意机制,它利用不同图像之间的特征相似性来计算相应的全局空间注意。基于此,我们提出了一种四分支循环框架——称为四重交叉相似度(QCS)——用于从同一类中提取区分性特征,并同步消除来自不同类别的冗余特征以精炼更清晰的特征。网络的对称性确保了训练的平衡和稳定性,并减少了CSA交互矩阵的数量。使用对比残差蒸馏将跨模块中学到的信息传递回基础网络。交叉注意力模块在训练过程中存在,而在推理时仅保留一个基本分支。我们提出的QCS模型在多个流行的FER数据集上优于最先进的方法,且无需额外的地标信息或其他额外的训练数据。代码可在以下链接获取:[此 https URL]。
URL
https://arxiv.org/abs/2411.01988