Abstract
Facial expression recognition (FER) remains a challenging task due to the ambiguity of expressions. The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net~(LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. Firstly, LA-Net uses landmark information to suppress the uncertainty in expression space and constructs the label distribution of each sample by neighborhood aggregation, which in turn improves the quality of training supervision. Secondly, the model incorporates landmark information into expression representations using the devised expression-landmark contrastive loss. The enhanced expression feature extractor can be less susceptible to label noise. Our method can be integrated with any deep neural network for better training supervision without introducing extra inference costs. We conduct extensive experiments on both in-the-wild datasets and synthetic noisy datasets and demonstrate that LA-Net achieves state-of-the-art performance.
Abstract (translated)
面部表情识别(FER)仍然是一项具有挑战性的任务,因为面部表情的歧义。产生的噪声标签在真实场景下显著影响表现。为了解决这个问题,我们提出了一个名为 landmarks-aware net~(LA-Net)的新FER模型,该模型利用面部地标以减少标签噪声的影响,从两个方面实现。首先,LA-Net使用地标信息抑制表达空间中的不确定,通过邻域聚合每个样本的标签分布,从而改善训练监督的质量。其次,模型使用专门设计的表达-地标对比度损失将地标信息嵌入表达表示中。增强的表达特征提取器可能不再易于受到标签噪声的影响。我们的方法和任何深度学习网络都可以集成,以更好地训练监督,而无需引入额外的推理成本。我们在野生数据和合成噪声数据上进行了广泛的实验,并证明了LA-Net取得了最先进的性能。
URL
https://arxiv.org/abs/2307.09023