Abstract
Facial expression is related to facial muscle contractions and different muscle movements correspond to different emotional states. For micro-expression recognition, the muscle movements are usually subtle, which has a negative impact on the performance of current facial emotion recognition algorithms. Most existing methods use self-attention mechanisms to capture relationships between tokens in a sequence, but they do not take into account the inherent spatial relationships between facial landmarks. This can result in sub-optimal performance on micro-expression recognition tasks.Therefore, learning to recognize facial muscle movements is a key challenge in the area of micro-expression recognition. In this paper, we propose a Hierarchical Transformer Network (HTNet) to identify critical areas of facial muscle movement. HTNet includes two major components: a transformer layer that leverages the local temporal features and an aggregation layer that extracts local and global semantical facial features. Specifically, HTNet divides the face into four different facial areas: left lip area, left eye area, right eye area and right lip area. The transformer layer is used to focus on representing local minor muscle movement with local self-attention in each area. The aggregation layer is used to learn the interactions between eye areas and lip areas. The experiments on four publicly available micro-expression datasets show that the proposed approach outperforms previous methods by a large margin. The codes and models are available at: \url{this https URL}
Abstract (translated)
面部表情与面部肌肉收缩有关,不同的肌肉运动对应着不同的情感状态。对于微表情识别,肌肉运动通常比较微妙,这会对当前面部情感识别算法的性能产生负面影响。大多数现有方法使用自注意力机制来捕捉序列中的 token 之间的关系,但它们没有考虑到面部地标的内在空间关系。这可能会导致在微表情识别任务中的 sub-optimal 表现。因此,学习识别面部肌肉运动是微表情识别领域的一个关键挑战。在本文中,我们提出了一种Hierarchical Transformer Network (HTNet)来识别面部肌肉运动的关键区域。HTNet 包括两个主要组件:一个Transformer层,利用 local Temporal 特征,另一个是聚合层,提取 local 和 global 语义面部特征。具体来说,HTNet将面部分为四个不同的面部区域:左唇区、左眼区、右眼区和右唇区。Transformer 层用于在每个区域中 local 自注意力地代表 local 的小肌肉运动。聚合层用于学习眼区和唇区之间的相互作用。在四个公开可用的微表情数据集上的实验表明, proposed 方法比先前方法表现更好。代码和模型可在 \url{this https URL} 找到。
URL
https://arxiv.org/abs/2307.14637