Paper Reading AI Learner

Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

2024-04-15 08:04:44
Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

Abstract

Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high-quality and diverse PBR textures, including albedo, normal, and roughness. It starts with enhancing Generative Adversarial Networks (GANs) for text-guided and diverse texture generation. To this end, we design a self-supervised paradigm to overcome the reliance on ground truth 3D textures and train the generative model with only entangled texture maps. Besides, we foster mutual enhancement between GANs and Score Distillation Sampling (SDS). SDS boosts GANs with more generative modes, while GANs promote more efficient optimization of SDS. Furthermore, we introduce an edge-aware SDS for multi-view consistent facial structure. Experiments demonstrate that our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency.

Abstract (translated)

自动3D面部纹理生成最近受到了广泛关注。现有的方法可能不支持传统的基于物理渲染管道,或者依赖于由光 stages捕获的3D数据。我们关键的贡献是一种渐进式的潜在空间细化方法,可以从基于面部图像生成的3D可塑模型(3DMM)纹理贴图开始,生成高质量和多样性的PBR纹理,包括Albedo、法和粗糙度。它从增强引导生成对抗网络(GANs)用于文本指导和大胆纹理生成开始。为此,我们设计了一个自监督的范式,以克服对真实3D纹理的依赖,并仅使用纠缠纹理贴图训练生成模型。此外,我们促进了GANs和评分蒸馏采样(SDS)之间的相互增强。SDS通过增加生成模式来提升GANs,而GANs通过更有效地优化SDS来推动SDS。此外,我们还引入了一个边缘感知的SDS,用于多视角一致的面部结构。实验证明,我们的方法在照片写实质量、多样性和效率方面超过了现有的3D纹理生成方法。

URL

https://arxiv.org/abs/2404.09540

PDF

https://arxiv.org/pdf/2404.09540.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot