Paper Reading AI Learner

Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model

2024-01-24 02:35:32
Yuanming Li, Gwantae Kim, Jeong-gi Kwak, Bon-hwa Ku, Hanseok Ko

Abstract

Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage training approach that effectively leverages limited datasets and the pre-trained diffusion model to obtain aligned pairs of landmarks and face in multiple domains. In the first stage, we train a landmark-conditioned face generation model on a large dataset of real faces. In the second stage, we fine-tune the above model on a small dataset of image-landmark pairs with text prompts for controlling the domain. Our new designs enable our method to generate high-quality synthetic paired datasets from multiple domains while preserving the alignment between landmarks and facial features. Finally, we fine-tuned a pre-trained face landmark detection model on the synthetic dataset to achieve multi-domain face landmark detection. Our qualitative and quantitative results demonstrate that our method outperforms existing methods on multi-domain face landmark detection.

Abstract (translated)

近年来,基于深度学习的野外面部关键点检测取得了显著的改进。然而,在其他领域(如卡通、漫画等)进行面部关键点检测仍然具有挑战性。这是因为缺乏大量注释的训练数据。为了应对这一问题,我们设计了一种两阶段训练方法,有效利用有限的数据集和预训练扩散模型,在多个领域获得对齐的关键点和面部。在第一阶段,我们在大量真实面部分割的大数据集上训练了一个带有关键点条件的面部生成模型。在第二阶段,我们在一个小型数据集中对上述模型进行微调,该数据集包含用于控制领域的图像关键点对。我们的新设计使我们的方法能够从多个领域生成高质量合成对齐的关键点数据,同时保留关键点与面部特征之间的对齐关系。最后,我们在合成数据集上对预训练面部关键点检测模型进行微调,以实现多领域面部关键点检测。我们的定性和定量结果表明,我们的方法在多领域面部关键点检测方面超过了现有方法。

URL

https://arxiv.org/abs/2401.13191

PDF

https://arxiv.org/pdf/2401.13191.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot