Paper Reading AI Learner

High-Fidelity Eye Animatable Neural Radiance Fields for Human Face

2023-08-01 18:26:55
Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

Abstract

Face rendering using neural radiance fields (NeRF) is a rapidly developing research area in computer vision. While recent methods primarily focus on controlling facial attributes such as identity and expression, they often overlook the crucial aspect of modeling eyeball rotation, which holds importance for various downstream tasks. In this paper, we aim to learn a face NeRF model that is sensitive to eye movements from multi-view images. We address two key challenges in eye-aware face NeRF learning: how to effectively capture eyeball rotation for training and how to construct a manifold for representing eyeball rotation. To accomplish this, we first fit FLAME, a well-established parametric face model, to the multi-view images considering multi-view consistency. Subsequently, we introduce a new Dynamic Eye-aware NeRF (DeNeRF). DeNeRF transforms 3D points from different views into a canonical space to learn a unified face NeRF model. We design an eye deformation field for the transformation, including rigid transformation, e.g., eyeball rotation, and non-rigid transformation. Through experiments conducted on the ETH-XGaze dataset, we demonstrate that our model is capable of generating high-fidelity images with accurate eyeball rotation and non-rigid periocular deformation, even under novel viewing angles. Furthermore, we show that utilizing the rendered images can effectively enhance gaze estimation performance.

Abstract (translated)

使用神经网络辐射场(NeRF)进行人脸渲染是一个快速发展的计算机视觉研究领域。尽管最近的方法主要关注控制面部属性,如身份和表达,但它们往往忽略了 Modeling eye rotation 的关键方面,这对于各种后续任务来说非常重要。在本文中,我们旨在学习一种能够敏感地从多视角图像中捕获眼部运动的人脸NeRF模型。我们解决了两个关键挑战,即如何在眼动aware人脸NeRF学习中有效地捕捉眼部运动,以及如何构建代表眼部运动的支集。为了实现这一点,我们首先考虑将 established 参数化人脸模型FLAME 应用于多视角图像,并考虑多视角一致性。随后,我们介绍了一种新的动态眼动awareNeRF(DeNeRF),DeNeRF将来自不同视角的3D点转换为一个标准空间,以学习一个统一的人脸NeRF模型。我们设计了用于变换的眼部变形场,包括Rigid 变换,如眼部旋转,和非Rigid 变换。通过在ETH-XGaze数据集上进行实验,我们证明,我们的模型能够生成高精度的眼部旋转和非RigidPeriocular变形的图像,即使在独特的视角下。此外,我们表明,利用渲染图像能够有效地增强眼动估计性能。

URL

https://arxiv.org/abs/2308.00773

PDF

https://arxiv.org/pdf/2308.00773.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot