Ablating Concepts in Text-to-Image Diffusion Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

Abstract (translated)

大规模文本到图像扩散模型可以生成高保真度的图像，并具有强大的组合能力。然而，这些模型通常训练在大量互联网数据上，通常包含版权材料、授权图像和个人照片。此外，它们已被发现复制各种生活艺术家的风格或记住确切的训练样本。我们怎样才能在没有从头训练模型的情况下删除这些版权概念或图像，而无需重新训练模型？为了实现这一目标，我们提出了一种高效的模型初始化方法，即防止生成目标概念。我们的算法学习将图像分布匹配为我们希望初始化的样式、实例或文本提示对应的分布。这防止了模型生成目标概念，由于其文本条件。广泛的实验表明，我们的方法可以成功防止生成初始化中删除的概念，同时保留模型中密切相关的概念。

URL

https://arxiv.org/abs/2303.13516

PDF

https://arxiv.org/pdf/2303.13516.pdf