POSTER V2: A simpler and stronger facial expression recognition network

Abstract
Abstract (translated)
URL
PDF

Abstract

Facial expression recognition (FER) plays an important role in a variety of real-world applications such as human-computer interaction. POSTER V1 achieves the state-of-the-art (SOTA) performance in FER by effectively combining facial landmark and image features through two-stream pyramid cross-fusion design. However, the architecture of POSTER V1 is undoubtedly complex. It causes expensive computational costs. In order to relieve the computational pressure of POSTER V1, in this paper, we propose POSTER V2. It improves POSTER V1 in three directions: cross-fusion, two-stream, and multi-scale feature extraction. In cross-fusion, we use window-based cross-attention mechanism replacing vanilla cross-attention mechanism. We remove the image-to-landmark branch in the two-stream design. For multi-scale feature extraction, POSTER V2 combines images with landmark's multi-scale features to replace POSTER V1's pyramid design. Extensive experiments on several standard datasets show that our POSTER V2 achieves the SOTA FER performance with the minimum computational cost. For example, POSTER V2 reached 92.21\% on RAF-DB, 67.49\% on AffectNet (7 cls) and 63.77\% on AffectNet (8 cls), respectively, using only 8.4G floating point operations (FLOPs) and 43.7M parameters (Param). This demonstrates the effectiveness of our improvements. The code and models are available at ~\url{this https URL}.

Abstract (translated)

面部表情识别(FER)在诸如人机交互等许多实际应用程序中发挥着重要作用。 POSTER V1通过有效地结合面部地标和图像特征,通过二流金字塔交叉融合设计实现了最先进的(SOTA)面部表情识别性能。然而,POSTER V1的架构无疑是复杂的,这导致了昂贵的计算成本。为了减轻POSTER V1的计算压力,在本文中,我们提出了POSTER V2。它从三个方向改进了POSTER V1:交叉融合、二流和多尺度特征提取。在交叉融合中,我们使用窗口基交叉注意力机制取代了传统的交叉注意力机制。在二流设计中,我们删除了图像到地标分支。对于多尺度特征提取,POSTER V2将图像与地标的多尺度特征组合起来,取代了POSTER V1的金字塔设计。对多个标准数据集进行了广泛的实验,表明我们的POSTER V2使用最小计算成本实现了最先进的面部表情识别性能。例如,POSTER V2在RAF-DB上达到了92.21%、AffectNet(7cls)上达到了67.49%、AffectNet(8cls)上达到了63.77%。仅使用8.4G的浮点运算(FLOPs)和43.7M参数(Param),POSTER V2取得了SOTA的面部表情识别性能。这证明了我们改进的有效性。代码和模型可访问~url{this https URL}。

URL

https://arxiv.org/abs/2301.12149

PDF

https://arxiv.org/pdf/2301.12149.pdf