Abstract
Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at this https URL
Abstract (translated)
给定一个3D网格,我们的目标是生成与任意文本描述相符的3D纹理。目前从采样视图生成和组装纹理的方法通常会导致明显的接缝或过度的平滑。为了应对这些问题,我们提出了TexGen,一种利用预训练的文本到图像扩散模型进行纹理生成的多视角采样和重采样框架。为了实现视图一致采样,首先,我们在RGB空间中维护一个纹理映射,该映射由去噪步骤参数化,并在扩散模型的每个采样步骤后更新,以逐步减少视差。利用关注引导的多视角采样策略,将纹理信息在视图中进行传播。为了保留纹理细节,我们开发了一种噪音重采样技术,该技术有助于估计噪音,生成后续去噪步骤的输入,根据文本提示和当前纹理映射进行指导。通过大量高质量的定性和定量评估,我们证明了我们的方法对于具有高度视差和丰富纹理细节的多样3D对象具有显著更好的纹理质量,超越了当前的尖端方法。此外,我们提出的纹理生成技术还可以应用于纹理编辑,同时保留原始身份。更多实验结果可以在这个链接上找到:https://www.example.com
URL
https://arxiv.org/abs/2408.01291