Paper Reading AI Learner

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution

2023-06-03 11:08:38
Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun Feng, Yujiu Yang, Yansong Tang

Abstract

Text-to-3D is an emerging task that allows users to create 3D content with infinite possibilities. Existing works tackle the problem by optimizing a 3D representation with guidance from pre-trained diffusion models. An apparent drawback is that they need to optimize from scratch for each prompt, which is computationally expensive and often yields poor visual fidelity. In this paper, we propose DreamPortrait, which aims to generate text-guided 3D-aware portraits in a single-forward pass for efficiency. To achieve this, we extend Score Distillation Sampling from datapoint to distribution formulation, which injects semantic prior into a 3D distribution. However, the direct extension will lead to the mode collapse problem since the objective only pursues semantic alignment. Hence, we propose to optimize a distribution with hierarchical condition adapters and GAN loss regularization. For better 3D modeling, we further design a 3D-aware gated cross-attention mechanism to explicitly let the model perceive the correspondence between the text and the 3D-aware space. These elaborated designs enable our model to generate portraits with robust multi-view semantic consistency, eliminating the need for optimization-based methods. Extensive experiments demonstrate our model's highly competitive performance and significant speed boost against existing methods.

Abstract (translated)

文本到三维是一项新兴任务,它允许用户以无限的可能创造三维内容。现有的工作通过从训练好的扩散模型中指导优化三维表示来解决这个问题。一个明显的缺点是他们需要为每个提示都重新优化,这是计算代价高昂的,并且通常会导致视觉效果不佳。在本文中,我们提出了梦想肖像,它旨在以高效的方式从文本引导的三维意识肖像中生成。为了实现这个目标,我们将评分蒸馏采样扩展到分布 formulation,将语义先验注入到三维分布中。然而,直接扩展将会导致模式崩溃问题,因为目标只是追求语义匹配。因此,我们提议使用分层条件适配器和GAN损失 Regularization 来优化分布。为了提供更好的三维建模,我们还设计了三维意识闭路交叉注意力机制,以明确让模型感知文本和三维意识空间之间的对应关系。这些 elaborate 的设计使我们能够生成具有稳健多视角语义一致性的肖像,从而不再需要基于优化的方法。广泛的实验证明了我们的模型的高竞争力表现以及与现有方法的重大速度提升。

URL

https://arxiv.org/abs/2306.02083

PDF

https://arxiv.org/pdf/2306.02083.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot