Abstract
When working with 3D facial data, improving fidelity and avoiding the uncanny valley effect is critically dependent on accurate 3D facial performance capture. Because such methods are expensive and due to the widespread availability of 2D videos, recent methods have focused on how to perform monocular 3D face tracking. However, these methods often fall short in capturing precise facial movements due to limitations in their network architecture, training, and evaluation processes. Addressing these challenges, we propose a novel face tracker, FlowFace, that introduces an innovative 2D alignment network for dense per-vertex alignment. Unlike prior work, FlowFace is trained on high-quality 3D scan annotations rather than weak supervision or synthetic data. Our 3D model fitting module jointly fits a 3D face model from one or many observations, integrating existing neutral shape priors for enhanced identity and expression disentanglement and per-vertex deformations for detailed facial feature reconstruction. Additionally, we propose a novel metric and benchmark for assessing tracking accuracy. Our method exhibits superior performance on both custom and publicly available benchmarks. We further validate the effectiveness of our tracker by generating high-quality 3D data from 2D videos, which leads to performance gains on downstream tasks.
Abstract (translated)
在处理三维面部数据时,提高准确性和避免深度谷效应的关键取决于准确的三维面部表演捕捉。因为这些方法代价昂贵,而且由于2D视频的广泛可用性,最近的方法集中于如何进行单目三维面部跟踪。然而,由于其网络架构、训练和评估过程的局限性,这些方法往往无法准确捕捉到精确的面部运动。为了解决这些问题,我们提出了一个新颖的跟踪器——流形面部(FlowFace),它引入了一种创新的高质量2D对齐网络来解决深度对齐问题。与之前的工作不同,流形面部在高质量3D扫描注释上进行训练,而不是弱监督或合成数据。我们的3D模型拟合模块与一个或多个观察结果相结合,实现了对 identity 和expression 的增强,以及对细节面部特征的重建。此外,我们还提出了一个用于评估跟踪准确性的新指标和基准。我们的方法在自定义和公开可用的基准上都表现出卓越的性能。为了进一步验证跟踪器的有效性,我们通过从2D视频中生成高质量的3D数据,实现了在下游任务上的性能提升。
URL
https://arxiv.org/abs/2404.09819