Abstract
Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage training approach that effectively leverages limited datasets and the pre-trained diffusion model to obtain aligned pairs of landmarks and face in multiple domains. In the first stage, we train a landmark-conditioned face generation model on a large dataset of real faces. In the second stage, we fine-tune the above model on a small dataset of image-landmark pairs with text prompts for controlling the domain. Our new designs enable our method to generate high-quality synthetic paired datasets from multiple domains while preserving the alignment between landmarks and facial features. Finally, we fine-tuned a pre-trained face landmark detection model on the synthetic dataset to achieve multi-domain face landmark detection. Our qualitative and quantitative results demonstrate that our method outperforms existing methods on multi-domain face landmark detection.
Abstract (translated)
近年来,基于深度学习的野外面部关键点检测取得了显著的改进。然而,在其他领域(如卡通、漫画等)进行面部关键点检测仍然具有挑战性。这是因为缺乏大量注释的训练数据。为了应对这一问题,我们设计了一种两阶段训练方法,有效利用有限的数据集和预训练扩散模型,在多个领域获得对齐的关键点和面部。在第一阶段,我们在大量真实面部分割的大数据集上训练了一个带有关键点条件的面部生成模型。在第二阶段,我们在一个小型数据集中对上述模型进行微调,该数据集包含用于控制领域的图像关键点对。我们的新设计使我们的方法能够从多个领域生成高质量合成对齐的关键点数据,同时保留关键点与面部特征之间的对齐关系。最后,我们在合成数据集上对预训练面部关键点检测模型进行微调,以实现多领域面部关键点检测。我们的定性和定量结果表明,我们的方法在多领域面部关键点检测方面超过了现有方法。
URL
https://arxiv.org/abs/2401.13191