Abstract
Most facial landmark detection methods predict landmarks by mapping the input facial appearance features to landmark heatmaps and have achieved promising results. However, when the face image is suffering from large poses, heavy occlusions and complicated illuminations, they cannot learn discriminative feature representations and effective facial shape constraints, nor can they accurately predict the value of each element in the landmark heatmap, limiting their detection accuracy. To address this problem, we propose a novel Reference Heatmap Transformer (RHT) by introducing reference heatmap information for more precise facial landmark detection. The proposed RHT consists of a Soft Transformation Module (STM) and a Hard Transformation Module (HTM), which can cooperate with each other to encourage the accurate transformation of the reference heatmap information and facial shape constraints. Then, a Multi-Scale Feature Fusion Module (MSFFM) is proposed to fuse the transformed heatmap features and the semantic features learned from the original face images to enhance feature representations for producing more accurate target heatmaps. To the best of our knowledge, this is the first study to explore how to enhance facial landmark detection by transforming the reference heatmap information. The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature.
Abstract (translated)
大多数面部地标检测方法通过将输入的面部外观特征映射到地标热图来预测地标,并取得了良好的结果。然而,当面部图像受到大型姿态、严重遮挡和复杂的照明条件时,它们无法学习鲜明的特征表示和有效的面部形状限制,也无法准确地预测地标热图每个元素的值,从而限制了它们的检测精度。为了解决这一问题,我们提出了一种新的参考热图Transformer(RHT),通过引入参考热图信息来提高更精确的面部地标检测。 proposed RHT由一个软转换模块(STM)和一个硬转换模块(HTM)组成,可以互相合作,鼓励准确转换参考热图信息和面部形状限制。然后,我们提出了一个多尺度特征融合模块(MSFFM),将转换后热图特征和从原始面部图像中学习到的语义特征进行融合,以增强特征表示,以产生更准确的目标热图。据我们所知,这是第一个研究探索通过转换参考热图信息来提高面部地标检测的方法。挑战性基准数据集的实验结果表明,我们提出的方法在文献中比最先进的方法表现更好。
URL
https://arxiv.org/abs/2303.07840