Abstract
Facial video inpainting plays a crucial role in a wide range of applications, including but not limited to the removal of obstructions in video conferencing and telemedicine, enhancement of facial expression analysis, privacy protection, integration of graphical overlays, and virtual makeup. This domain presents serious challenges due to the intricate nature of facial features and the inherent human familiarity with faces, heightening the need for accurate and persuasive completions. In addressing challenges specifically related to occlusion removal in this context, our focus is on the progressive task of generating complete images from facial data covered by masks, ensuring both spatial and temporal coherence. Our study introduces a network designed for expression-based video inpainting, employing generative adversarial networks (GANs) to handle static and moving occlusions across all frames. By utilizing facial landmarks and an occlusion-free reference image, our model maintains the user's identity consistently across frames. We further enhance emotional preservation through a customized facial expression recognition (FER) loss function, ensuring detailed inpainted outputs. Our proposed framework exhibits proficiency in eliminating occlusions from facial videos in an adaptive form, whether appearing static or dynamic on the frames, while providing realistic and coherent results.
Abstract (translated)
面部视频修复在广泛的应用中扮演着关键角色,包括但不仅限于视频会议和远程医疗中的遮挡消除,面部表情分析的增强,隐私保护,图形覆盖物的集成以及虚拟化妆。由于面部特征的复杂性以及人类对脸的熟悉程度,这个领域提出了严峻的挑战,增加了准确和说服力的完整性的需求。在解决遮挡消除在这个场景中的具体挑战时,我们的关注点是生成完整的图像,确保在所有帧中都保持空间和时间上的连贯性。我们的研究引入了一个基于表达式视频修复的神经网络,使用生成对抗网络(GANs)处理所有帧中的静态和动态遮挡。通过利用面部特征点和一张无遮挡的参考图像,我们的模型在帧之间保持用户身份的一致性。我们还通过自定义面部表情识别(FER)损失函数来增强情感保留,确保详细的修复输出。我们提出的框架在消除面部视频遮挡方面表现出适应性,无论是静态还是动态出现在帧中,都能提供真实和连贯的结果。
URL
https://arxiv.org/abs/2402.09100