Paper Reading AI Learner

3D Face Alignment Through Fusion of Head Pose Information and Features

2023-08-25 12:01:24
Jaehyun So, Youngjoon Han

Abstract

The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information with the feature maps of a face alignment network, rather than simply using it to initialize facial landmarks. Furthermore, the proposed network structure performs robust face alignment through a dual-dimensional network using multidimensional features represented by 2D feature maps and a 3D heatmap. For effective dense face alignment, we also propose a prediction method for facial geometric landmarks through training based on knowledge distillation using predicted keypoints. We experimentally assessed the correlation between the predicted facial landmarks and head pose information, as well as variations in the accuracy of facial landmarks with respect to the quality of head pose information. In addition, we demonstrated the effectiveness of the proposed method through a competitive performance comparison with state-of-the-art methods on the AFLW2000-3D, AFLW, and BIWI datasets.

Abstract (translated)

人类从面部形状推断头部姿态的能力,以及反过来,表明这两个方面之间存在强烈的相关性。因此,最近在面部对齐方面的研究使用了头部姿态信息来预测面部地标,以改善面部对齐性能。在本研究中,我们提出了一种新的方法来使用头部姿态信息来提高面部对齐性能,通过将这些信息与面部对齐网络的特征映射相结合,而不是仅仅使用它来初始化面部地标。此外,我们提出了一种网络结构,它通过使用两个维度的网络,使用2D特征映射和3D热图来表示多个维度的特征。为了实现高效的密集面部对齐,我们还提出了一种面部几何地标的预测方法,通过基于预测关键点的知识蒸馏来训练。我们实验性地评估了预测面部地标和头部姿态信息之间的相关性,以及面部地标的准确性与头部姿态信息的质量之间的变化。此外,我们还通过在 AFLW2000-3D、AFLW和BIWI数据集上与最先进的方法进行竞争性能比较,证明了我们提出的方法的有效性。

URL

https://arxiv.org/abs/2308.13327

PDF

https://arxiv.org/pdf/2308.13327.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot