Abstract
Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at this https URL.
Abstract (translated)
生成模型,例如Stable Diffusion,已经使从文本提示创建逼真的图像成为可能。然而,从文本生成360度全景图像仍然具有挑战性,特别是由于缺乏成对文本全景数据和全景与透视图像之间的领域差,使得该任务更加困难。在本文中,我们介绍了一种名为PanFusion的新型双分支扩散模型,用于从文本提示生成360度图像。我们利用稳定扩散模型作为一臂,提供自然图像生成方面的先验知识,并将其注册到另一臂全景分支上,以实现整体图像生成。我们提出了一种独特的跨注意机制,具有投影意识,用于在合作去噪过程中最小化扭曲。我们的实验验证,PanFusion超越了现有方法,得益于其双分支结构,它可以集成自定义全景输出的附加约束,如房间布局。代码可以从该链接获取:https://www.acm.org/dl/d/2216606-panfusion
URL
https://arxiv.org/abs/2404.07949