Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

Abstract
Abstract (translated)
URL
PDF

Abstract

Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN-Pyramid, as a robust prior. This generator is capable of producing 360° canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the feature-map-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN-Pyramid. To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN-Pyramid's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.

Abstract (translated)

现有的基于神经渲染的文本-3D人物生成方法通常利用人体几何信息和扩散模型来获得指导。然而，仅依赖几何信息会引入诸如Janus问题、过度饱和和过度平滑等问题。我们提出了Portrait3D，一种新型的基于神经渲染的框架，具有新颖的联合几何-外观先验，以实现文本-3D人物生成，从而克服上述问题。为了实现这一目标，我们训练了一个3D人物生成器--3DPortraitGAN-Pyramid作为稳健的前体。这个生成器能够生成360°的规范3D人物，作为后续扩散-based生成过程的起点。为了减轻由高频信息引起的“网格状”伪影问题，我们将在3DPortraitGAN-Pyramid中集成一种新颖的等腰三角形3D表示。为了从文本中生成3D人物，我们首先将随机的图像与给定提示对齐，并将其投影到预训练的3DPortraitGAN-Pyramid的潜在空间中。得到的潜在代码随后用于合成等腰三角形。从获得的等腰三角形开始，我们使用评分差异抽样将扩散模型的知识引入到等腰三角形中。接着，我们利用扩散模型优化3D人物渲染图像，然后将这些优化后的图像作为训练数据进一步优化等腰三角形，有效地消除了不真实颜色和异常 artifacts。我们的实验结果表明，Portrait3D可以生成真实、高质量和规范的3D人物，与给定提示相符。

URL

https://arxiv.org/abs/2404.10394

PDF

https://arxiv.org/pdf/2404.10394.pdf

Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

Abstract

Abstract (translated)

URL

PDF Copy

PDF