ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation

Abstract
Abstract (translated)
URL
PDF

Abstract

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes like material, style, layout, etc. remains a challenge, leading to a lack of disentanglement and editability. To address this, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low- to high-frequency information, providing a new perspective on representing, generating, and editing images. We develop Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer stronger disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image/text-guided material/style/layout transfer/editing, achieving previously unattainable results with a single image input without fine-tuning the diffusion models.

Abstract (translated)

个性化生成模型提供了一个用户提供参考的方式来指导图像生成，提供了一种新的视角，可以代表、生成和编辑图像。目前，个性化方法可以将对象或概念翻转到文本 conditioning space 中，并为文本到图像扩散模型生成新的自然语句。然而，代表和编辑特定的视觉属性，如材料、风格、布局等仍然是一个挑战，导致缺乏分离性和编辑性。为了解决这个问题，我们提出了一种新方法，利用扩散模型的每一步生成过程，提供了新的代表、生成和编辑图像的视角。我们开发Prompt Spectrum Space P*、扩展了文本 conditioning space，并开发了一个新的图像表示方法，称为ProSpect。ProSpect 表示一个图像是一个从每个阶段引导的逆转文本 token embeddings 编码的集合，每个引导对应于扩散模型的特定生成阶段(即一组连续的步骤)。实验结果显示，P* 和 ProSpect 相比现有方法提供了更强的分离性和控制性。我们应用 ProSpect 在各种个性化属性aware图像生成应用程序中，如图像/文本引导的材料、风格、布局转移/编辑，通过单个图像输入实现了以前无法达到的结果，而不需要微调扩散模型。

URL

https://arxiv.org/abs/2305.16225

PDF

https://arxiv.org/pdf/2305.16225.pdf