Abstract
Recent diffusion-based Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. However, these methods usually struggle to produce high-fidelity 3D models, frequently yielding excessively blurred textures. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views and ultimately leading to blurred 3D representations. In this paper, we address this issue by comprehensively exploiting multi-view priors in both the conditioning and diffusion procedures to produce consistent, detail-rich portraits. From the conditioning standpoint, we propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. From the diffusion perspective, considering the significant impact of the diffusion noise distribution on detailed texture generation, we propose a Multi-View Noise Resamplig Strategy integrated within the optimization process leveraging cross-view priors to enhance representation consistency. Extensive experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image. The project page is at \url{this https URL}.
Abstract (translated)
近期基于扩散的单图像三维肖像生成方法通常采用二维扩散模型来提供多视角知识,然后将这些信息提炼成三维表示。然而,这些方法常常难以生成高保真的三维模型,经常导致纹理过度模糊。我们把这个问题归因于在扩散过程中对跨视图一致性考虑不足,这导致不同视图之间存在显著差异,并最终使得三维表示变得模糊。在这篇论文中,我们通过全面利用条件设置和扩散过程中的多视角先验来解决这一问题,从而生成一致且细节丰富的肖像。从条件设定的角度来看,我们提出了一种混合先验扩散模型,该模型显式和隐式地将多视图先验作为条件,以增强生成的多视图肖像的状态一致性。从扩散角度看,考虑到扩散噪声分布对详细纹理生成的重大影响,我们提出了一个结合跨视角先验并整合在优化过程中的多视角噪声重采样策略,用以增强表示的一致性。大量的实验表明,我们的方法可以从单张图像中生成具有准确几何结构和丰富细节的三维肖像。该项目页面位于 \url{this https URL}。
URL
https://arxiv.org/abs/2411.10369