Abstract
The incorporation of 3D data in facial analysis tasks has gained popularity in recent years. Though it provides a more accurate and detailed representation of the human face, accruing 3D face data is more complex and expensive than 2D face images. Either one has to rely on expensive 3D scanners or depth sensors which are prone to noise. An alternative option is the reconstruction of 3D faces from uncalibrated 2D images in an unsupervised way without any ground truth 3D data. However, such approaches are computationally expensive and the learned model size is not suitable for mobile or other edge device applications. Predicting dense 3D landmarks over the whole face can overcome this issue. As there is no public dataset available containing dense landmarks, we propose a pipeline to create a dense keypoint training dataset containing 520 key points across the whole face from an existing facial position map data. We train a lightweight MobileNet-based regressor model with the generated data. As we do not have access to any evaluation dataset with dense landmarks in it we evaluate our model against the 68 keypoint detection task. Experimental results show that our trained model outperforms many of the existing methods in spite of its lower model size and minimal computational cost. Also, the qualitative evaluation shows the efficiency of our trained models in extreme head pose angles as well as other facial variations and occlusions.
Abstract (translated)
近年来,将三维数据纳入面部分析任务变得越来越流行。尽管它提供了更加准确和详细的人类面部表示,但积累三维面部数据比积累二维面部图像更加复杂和昂贵。要么你需要依赖昂贵的三维扫描仪或深度传感器,这些传感器容易噪声。另一种选择是以一种无监督的方式从未校准的二维图像中重构三维面部,而不需要任何真实的三维数据。然而,这些方法计算代价很高,学习的模型大小不适合移动设备或其他边缘设备应用程序。预测整个面部的密集三维地标可以克服这个问题。由于没有包含密集地标的公共数据集,我们提出了一条管道来创建一个包含整个面部520个关键点的密集关键点训练集,该数据集从现有的面部位置地图数据中生成。我们训练一个轻量级的移动网桥回归模型,使用生成的数据。由于我们没有访问任何包含密集地标的评估数据集,我们对联姻模型进行了68关键点检测任务的评价。实验结果表明,我们的训练模型尽管模型大小较小,但比许多现有方法表现更好,尽管其性能较低。此外,定性评估表明,我们的训练模型在极端头姿态角度和其他面部变异和遮挡条件下的效率。
URL
https://arxiv.org/abs/2308.15170