Abstract
Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly harms the generative diversity, especially when given subject images are few. To this end, we propose Pick-and-Draw, a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. Our approach consists of two components: appearance picking guidance and layout drawing guidance. As for the former, we construct an appearance palette with visual features from the reference image, where we pick local patterns for generating the specified subject with consistent identity. As for layout drawing, we outline the subject's contour by referring to a generative template from the vanilla diffusion model, and inherit the strong image prior to synthesize diverse contexts according to different text conditions. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image. Qualitative and quantitative experiments show that Pick-and-Draw consistently improves identity consistency and generative diversity, pushing the trade-off between subject fidelity and image-text fidelity to a new Pareto frontier.
Abstract (translated)
基于扩散的文本-图像个性化取得了很大的成功,在生成用户指定各种上下文中的主题之间。尽管如此,现有的基于微调的方法仍然存在模型过拟合的问题,这极大地破坏了生成多样性,尤其是在主题图像很少的情况下。为此,我们提出了Pick-and-Draw,一种无需训练的语义指导方法,以提高个性化方法的身份一致性和生成多样性。我们的方法包括两个组件:外观选择指导和布局绘制指导。关于前者,我们通过构建参考图像的视觉特征来构建外观调色板,在那里我们选择局部模式来生成指定主题的一致身份。关于布局绘制,我们通过参考原版扩散模型的生成模板来绘制主题轮廓,并根据不同的文本条件继承强大的图像先验。所提出的方法可以应用于任何个性化的扩散模型,并且只需要一个参考图像。定性和定量的实验证明,Pick-and-Draw始终能够提高身份一致性和生成多样性,将主题一致性和图像文本一致性之间的权衡推向新的帕累托前沿。
URL
https://arxiv.org/abs/2401.16762