Abstract
We introduce ObjectAdd, a training-free diffusion modification method to add user-expected objects into user-specified area. The motive of ObjectAdd stems from: first, describing everything in one prompt can be difficult, and second, users often need to add objects into the generated image. To accommodate with real world, our ObjectAdd maintains accurate image consistency after adding objects with technical innovations in: (1) embedding-level concatenation to ensure correct text embedding coalesce; (2) object-driven layout control with latent and attention injection to ensure objects accessing user-specified area; (3) prompted image inpainting in an attention refocusing & object expansion fashion to ensure rest of the image stays the same. With a text-prompted image, our ObjectAdd allows users to specify a box and an object, and achieves: (1) adding object inside the box area; (2) exact content outside the box area; (3) flawless fusion between the two areas
Abstract (translated)
我们提出了ObjectAdd,一种无需训练的扩散修改方法,可以将用户期望的对象添加到用户指定的区域中。ObjectAdd的动机源于:首先,在仅有一个提示的情况下描述一切可能很难;其次,用户通常需要将对象添加到生成的图像中。为了适应现实世界,我们的ObjectAdd在添加对象时保持了准确的图像一致性:通过(1)在嵌入层级连接中进行连接以确保正确文本嵌入聚类;(2)使用潜在和注意注入的对象驱动布局控制确保访问用户指定区域的物体;(3)在关注重新聚焦和物体扩展的方式中进行提示图像修复,确保其余部分与初始图像相同。有了文本提示的图像,我们的ObjectAdd允许用户指定一个框和一个物体,并实现了: (1)在框内添加物体;(2)超出框外的精确内容;(3)两个区域的无缝融合
URL
https://arxiv.org/abs/2404.17230