Abstract
Head-mounted displays (HMDs) serve as indispensable devices for observing extended reality (XR) environments and virtual content. However, HMDs present an obstacle to external recording techniques as they block the upper face of the user. This limitation significantly affects social XR applications, specifically teleconferencing, where facial features and eye gaze information play a vital role in creating an immersive user experience. In this study, we propose a new network for expression-aware video inpainting for HMD removal (EVI-HRnet) based on generative adversarial networks (GANs). Our model effectively fills in missing information with regard to facial landmarks and a single occlusion-free reference image of the user. The framework and its components ensure the preservation of the user's identity across frames using the reference frame. To further improve the level of realism of the inpainted output, we introduce a novel facial expression recognition (FER) loss function for emotion preservation. Our results demonstrate the remarkable capability of the proposed framework to remove HMDs from facial videos while maintaining the subject's facial expression and identity. Moreover, the outputs exhibit temporal consistency along the inpainted frames. This lightweight framework presents a practical approach for HMD occlusion removal, with the potential to enhance various collaborative XR applications without the need for additional hardware.
Abstract (translated)
头戴显示器(HMDs)对于观察扩展现实(XR)环境和虚拟内容至关重要。然而,HMDs 对外部录制技术构成了障碍,因为它们挡住了用户的 upper face。这一限制大大影响了社交 XR 应用,特别是视频会议,因为在创建沉浸式用户体验的过程中,面部特征和眼动信息至关重要。在这项研究中,我们提出了一个基于生成对抗网络(GANs)的表达式注意视频修复(EVI-HRnet)新网络。我们的模型有效地通过修复面部特征和用户单张不遮挡的参考图像来填补缺失信息。该框架及其组件确保在帧之间保留用户的身份。为了进一步提高修复输出后的现实水平,我们引入了一种新的面部表情识别(FER)损失函数,用于情感保留。我们的结果表明,与修复后的面部视频相比,该建议框架可以有效地从面部视频中移除 HMD,同时保留主体的面部表情和身份。此外,修复后的输出在修复帧之间具有时间一致性。这个轻量级框架为 HMD 遮挡移除提供了一个实际的方法,具有不需要额外硬件来增强各种协作 XR 应用程序的潜力。
URL
https://arxiv.org/abs/2401.14136