Abstract
We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.
Abstract (translated)
我们提出了Dream Booth3D方法,一种从只需要3-6个随意拍摄的对象的图像就可以个性化生成文本到三维模型的方法。我们的方法结合了最近在个性化文本到图像模型(Dream Booth)和文本到三维生成(DreamFusion)方面的进展。我们发现,简单地将这些方法结合起来无法生成令人满意的主题特定的三维资产,因为个性化文本到图像模型过度适应对象输入视角。我们采用了三步骤的优化策略,同时利用神经网络亮度场三维一致性和文本到图像模型的个性化能力。我们的方法能够以文本驱动的方式生成高质量的主题特定的三维资产,例如对象输入图像中未曾出现的新姿态、颜色和属性。
URL
https://arxiv.org/abs/2303.13508