Paper Reading AI Learner

Unsupervised Style-based Explicit 3D Face Reconstruction from Single Image

2023-04-24 21:25:06
Heng Yu, Zoltan A. Milacski, Laszlo A. Jeni

Abstract

Inferring 3D object structures from a single image is an ill-posed task due to depth ambiguity and occlusion. Typical resolutions in the literature include leveraging 2D or 3D ground truth for supervised learning, as well as imposing hand-crafted symmetry priors or using an implicit representation to hallucinate novel viewpoints for unsupervised methods. In this work, we propose a general adversarial learning framework for solving Unsupervised 2D to Explicit 3D Style Transfer (UE3DST). Specifically, we merge two architectures: the unsupervised explicit 3D reconstruction network of Wu et al.\ and the Generative Adversarial Network (GAN) named StarGAN-v2. We experiment across three facial datasets (Basel Face Model, 3DFAW and CelebA-HQ) and show that our solution is able to outperform well established solutions such as DepthNet in 3D reconstruction and Pix2NeRF in conditional style transfer, while we also justify the individual contributions of our model components via ablation. In contrast to the aforementioned baselines, our scheme produces features for explicit 3D rendering, which can be manipulated and utilized in downstream tasks.

Abstract (translated)

从一张图像推断3D物体结构是一个缺乏解决方案的任务,因为这涉及到深度歧义和遮挡。在文献中,通常的分辨率包括利用2D或3D基准 truth 进行监督学习,以及采用手工制定的对称前提或使用隐含表示来幻觉新的视角,以无监督方法为例。在本研究中,我们提出了一个通用的对抗学习框架,以解决无监督2D到 explicit 3D风格转移(UE3DST)问题。具体来说,我们将 Wu等人的无监督 explicit 3D重构网络和名为 StarGAN-v2的生成对抗网络合并。我们跨越三个面部数据集(Basel Face Model、3DFAW 和CelebA-HQ)进行实验,并表明我们的解决方案能够在3D重建和条件风格转移方面胜过良好确立的解决方案,如深度网络和 Pix2NeRF,同时我们也通过削除证明了我们模型组件的个人贡献。与上述基准相比,我们的方案生成了 explicit 3D渲染的特征,这些特征可以在后续任务中操纵和利用。

URL

https://arxiv.org/abs/2304.12455

PDF

https://arxiv.org/pdf/2304.12455.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot