Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

Abstract
Abstract (translated)
URL
PDF

Abstract

Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.

Abstract (translated)

音频驱动的面部重绘是一项关键的技术，可以在电影制作、虚拟角色和视频会议等领域广泛应用。现有的工作通常使用明确的中间面部表示法(例如2D面部地标或3D面部模型)或隐形的表示法(例如神经辐射场)，因此面临着解释性和表现力之间的权衡，以及控制性和结果质量之间的权衡。在本文中，我们提出了一种新的参数化的隐形面部表示法，并提出了一种新的音频驱动的面部重绘框架，既能控制又能生成高质量的对话头。具体来说，我们的参数化的隐形表示法将隐形表示法与3D面部模型的可解释参数相结合，从而克服了明确和隐形方法的最佳结合。此外，我们提出了几种新技术来改进我们框架的三个组件，包括i)将上下文信息融入音频到表达参数编码中；ii)使用条件图像合成来参数化隐形表示，并采用创新的三平面结构进行高效学习；iii)将面部重绘问题转化为条件图像填充问题，并提出了一种新的数据增强技术，以提高模型的泛化能力。广泛的实验结果表明，我们的方法能够生成比先前方法更为真实的结果，更加忠实地反映发言者的身份和谈话风格。

URL

https://arxiv.org/abs/2306.07579

PDF

https://arxiv.org/pdf/2306.07579.pdf

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

Abstract

Abstract (translated)

URL

PDF Copy

PDF