Paper Reading AI Learner

A lightweight 3D dense facial landmark estimation model from position map data

2023-08-29 09:53:10
Shubhajit Basak, Sathish Mangapuram, Gabriel Costache, Rachel McDonnell, Michael Schukat

Abstract

The incorporation of 3D data in facial analysis tasks has gained popularity in recent years. Though it provides a more accurate and detailed representation of the human face, accruing 3D face data is more complex and expensive than 2D face images. Either one has to rely on expensive 3D scanners or depth sensors which are prone to noise. An alternative option is the reconstruction of 3D faces from uncalibrated 2D images in an unsupervised way without any ground truth 3D data. However, such approaches are computationally expensive and the learned model size is not suitable for mobile or other edge device applications. Predicting dense 3D landmarks over the whole face can overcome this issue. As there is no public dataset available containing dense landmarks, we propose a pipeline to create a dense keypoint training dataset containing 520 key points across the whole face from an existing facial position map data. We train a lightweight MobileNet-based regressor model with the generated data. As we do not have access to any evaluation dataset with dense landmarks in it we evaluate our model against the 68 keypoint detection task. Experimental results show that our trained model outperforms many of the existing methods in spite of its lower model size and minimal computational cost. Also, the qualitative evaluation shows the efficiency of our trained models in extreme head pose angles as well as other facial variations and occlusions.

Abstract (translated)

近年来,将三维数据纳入面部分析任务变得越来越流行。尽管它提供了更加准确和详细的人类面部表示,但积累三维面部数据比积累二维面部图像更加复杂和昂贵。要么你需要依赖昂贵的三维扫描仪或深度传感器,这些传感器容易噪声。另一种选择是以一种无监督的方式从未校准的二维图像中重构三维面部,而不需要任何真实的三维数据。然而,这些方法计算代价很高,学习的模型大小不适合移动设备或其他边缘设备应用程序。预测整个面部的密集三维地标可以克服这个问题。由于没有包含密集地标的公共数据集,我们提出了一条管道来创建一个包含整个面部520个关键点的密集关键点训练集,该数据集从现有的面部位置地图数据中生成。我们训练一个轻量级的移动网桥回归模型,使用生成的数据。由于我们没有访问任何包含密集地标的评估数据集,我们对联姻模型进行了68关键点检测任务的评价。实验结果表明,我们的训练模型尽管模型大小较小,但比许多现有方法表现更好,尽管其性能较低。此外,定性评估表明,我们的训练模型在极端头姿态角度和其他面部变异和遮挡条件下的效率。

URL

https://arxiv.org/abs/2308.15170

PDF

https://arxiv.org/pdf/2308.15170.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot