Paper Reading AI Learner

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

2023-09-05 12:44:57
Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

Abstract

Previous animatable 3D-aware GANs for human generation have primarily focused on either the human head or full body. However, head-only videos are relatively uncommon in real life, and full body generation typically does not deal with facial expression control and still has challenges in generating high-quality results. Towards applicable video avatars, we present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements. It is a generative model trained on unstructured 2D image collections without using 3D or video data. For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations. A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces, which is critical for portrait images. A pose deformation processing network is developed to generate plausible deformations for challenging regions such as long hair. Experiments show that our method, trained on unstructured 2D images, can generate diverse and high-quality 3D portraits with desired control over different properties.

Abstract (translated)

之前的人类生成器使用的可模拟3D感知GAN主要关注人类头部或整个身体的生成。然而,在现实生活中,只生成头部视频较为罕见,而且通常不会涉及到面部表情控制和生成高质量结果的挑战。为了适用于视频虚拟角色,我们提出了一种可模拟3D感知的GAN,能够生成可控制面部表情、头部姿势和肩膀运动的肖像图像。这是一个在不使用3D或视频数据的情况下从无向图表示学习到的生成模型。针对新的任务,我们基于生成光度聚类表示来学习生成器和对抗器,并配备了可学习面部表情和头部姿势变形。我们提出了双摄像头渲染和对抗学习方案来改进生成面部质量,这对于肖像图像至关重要。我们还开发了 pose变形处理网络,用于生成令人信服的变形,例如长发。实验表明,我们的方法在无向图表示学习中训练,能够生成各种高质量的3D肖像,并能够对不同属性进行定制控制。

URL

https://arxiv.org/abs/2309.02186

PDF

https://arxiv.org/pdf/2309.02186.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot