Abstract
Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
Abstract (translated)
在包括创意设计和电子商务在内的各个领域中,生成突出物场景对突出物在场景中的表现和上下文至关重要。通过将对象整合到定制环境中,可以增强主题的表现和上下文。生成背景的过程可以看作是一个文本条件下的修复绘画任务,其目标是将图像内容扩展到突出物的边界之外。尽管引导文本修复绘图模型(例如)也可以通过遮罩反向填充进行修复,但它们通过填充图像的缺失部分来修复图像,而不是将物体放入场景中。因此,当用于背景生成时,修复绘图模型经常扩展突出物的边界,从而改变物体的身份,这种现象我们称之为“物体膨胀”。本文介绍了一个使用Stable Diffusion和ControlNet架构将修复扩散模型适应突出物修复任务的模型。我们在模型和数据集上展示了的一系列定性和定量结果,包括一个不需要任何人类标注的新指标来衡量物体膨胀。与Stable Diffusion 2.0修复绘图相比,我们提出的方法在多个数据集上的标准视觉指标上减少了3.6倍的物体膨胀。
URL
https://arxiv.org/abs/2404.10157