Abstract
While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure. % Extensive experimental results demonstrate the effectiveness of Hunyuan3D-1.0 in generating high-quality 3D assets. Our framework involves the text-to-image model ~\ie, Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has $10\times$ more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets.
Abstract (translated)
虽然三维生成模型极大地改善了艺术家的工作流程,现有的用于三维生成的扩散模型仍面临着生成速度慢和泛化能力差的问题。为了解决这一问题,我们提出了一种两阶段的方法,命名为Hunyuan3D-1.0,包括轻量级版本和标准版本,它们均支持基于文本和图像条件的生成。在第一阶段,我们采用了一个多视角扩散模型,该模型能够高效地在大约4秒内生成多视角RGB图。这些多视角图像从不同的视点捕捉三维资产的丰富细节,将任务从单视图重建扩展到多视图重建。在第二阶段,我们引入了一种前馈重构模型,在约7秒的时间内快速且忠实地根据生成的多视角图像进行三维资产的重构。重构网络学会了处理由多视角扩散带来的噪声和不一致性,并利用条件图像中的可用信息高效地恢复三维结构。广泛的实验结果证明了Hunyuan3D-1.0在生成高质量三维资产方面的有效性。我们的框架涉及文本到图像模型,即Hunyuan-DiT,使其成为一个统一的框架来支持基于文本和图像条件的三维生成。与轻量级版本和其他现有模型相比,我们的标准版本参数多出约10倍。Hunyuan3D-1.0实现了速度和质量之间的出色平衡,在显著减少生成时间的同时保持了生成资产的质量和多样性。
URL
https://arxiv.org/abs/2411.02293