Abstract
The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information with the feature maps of a face alignment network, rather than simply using it to initialize facial landmarks. Furthermore, the proposed network structure performs robust face alignment through a dual-dimensional network using multidimensional features represented by 2D feature maps and a 3D heatmap. For effective dense face alignment, we also propose a prediction method for facial geometric landmarks through training based on knowledge distillation using predicted keypoints. We experimentally assessed the correlation between the predicted facial landmarks and head pose information, as well as variations in the accuracy of facial landmarks with respect to the quality of head pose information. In addition, we demonstrated the effectiveness of the proposed method through a competitive performance comparison with state-of-the-art methods on the AFLW2000-3D, AFLW, and BIWI datasets.
Abstract (translated)
人类从面部形状推断头部姿态的能力,以及反过来,表明这两个方面之间存在强烈的相关性。因此,最近在面部对齐方面的研究使用了头部姿态信息来预测面部地标,以改善面部对齐性能。在本研究中,我们提出了一种新的方法来使用头部姿态信息来提高面部对齐性能,通过将这些信息与面部对齐网络的特征映射相结合,而不是仅仅使用它来初始化面部地标。此外,我们提出了一种网络结构,它通过使用两个维度的网络,使用2D特征映射和3D热图来表示多个维度的特征。为了实现高效的密集面部对齐,我们还提出了一种面部几何地标的预测方法,通过基于预测关键点的知识蒸馏来训练。我们实验性地评估了预测面部地标和头部姿态信息之间的相关性,以及面部地标的准确性与头部姿态信息的质量之间的变化。此外,我们还通过在 AFLW2000-3D、AFLW和BIWI数据集上与最先进的方法进行竞争性能比较,证明了我们提出的方法的有效性。
URL
https://arxiv.org/abs/2308.13327