Abstract
Facial micro-expression recognition (MER) is a challenging problem, due to transient and subtle micro-expression (ME) actions. Most existing methods depend on hand-crafted features, key frames like onset, apex, and offset frames, or deep networks limited by small-scale and low-diversity datasets. In this paper, we propose an end-to-end micro-action-aware deep learning framework with advantages from transformer, graph convolution, and vanilla convolution. In particular, we propose a novel F5C block composed of fully-connected convolution and channel correspondence convolution to directly extract local-global features from a sequence of raw frames, without the prior knowledge of key frames. The transformer-style fully-connected convolution is proposed to extract local features while maintaining global receptive fields, and the graph-style channel correspondence convolution is introduced to model the correlations among feature patterns. Moreover, MER, optical flow estimation, and facial landmark detection are jointly trained by sharing the local-global features. The two latter tasks contribute to capturing facial subtle action information for MER, which can alleviate the impact of insufficient training data. Extensive experiments demonstrate that our framework (i) outperforms the state-of-the-art MER methods on CASME II, SAMM, and SMIC benchmarks, (ii) works well for optical flow estimation and facial landmark detection, and (iii) can capture facial subtle muscle actions in local regions associated with MEs. The code is available at this https URL.
Abstract (translated)
面部微表情识别(MER)是一个具有挑战性的问题,由于短暂和细微的微表情动作。大多数现有方法依赖于手工设计特征、关键帧如起始点、顶峰点及结束点,或受制于小规模且低多样性数据集的深度网络。在本文中,我们提出了一种结合了变压器、图卷积以及普通卷积优势的端到端微动作感知深度学习框架。 特别地,我们提出了一种新颖的F5C(全连接卷积与通道对应卷积组成)模块,可以直接从一系列原始帧序列提取局部-全局特征,而无需事先知道关键帧的信息。提出的变压器风格的全连接卷积旨在提取局部特征的同时保持全局感受野,图样式通道对应卷积则被引入以建模特征模式之间的相关性。 此外,MER、光流估计和面部标志点检测通过共享局部-全局特征进行联合训练。后两项任务有助于捕捉对微表情识别有用的细微面部动作信息,从而缓解因训练数据不足造成的影响。 广泛的实验表明,我们的框架(i)在CASME II、SAMM 和 SMIC 评估基准上超越了最先进的MER方法,(ii)对于光流估计和面部标志点检测表现出良好的性能,并且(iii)能够捕捉与微表情相关的局部区域内的细微肌肉动作。代码可在[此处](https://this https URL)获取。
URL
https://arxiv.org/abs/2506.14511