Abstract
Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.
Abstract (translated)
扩散模型(DMs)踏上了一个新的生成建模时代,并为高质和真实数据样本的生成提供了更多机会。然而,它们的广泛应用也催生了模型安全性的新挑战。这促使我们在DM上创建更有效的对抗攻击者,以了解其漏洞。我们提出了CAAT,一种简单但通用且高效的解决方案,不需要昂贵的训练,就能有效地欺骗潜在扩散模型(LDMs)。该方法基于一个观察,即跨注意层表现出对梯度变化的更高敏感性,允许我们利用已发布图像上的微小扰动来显著破坏生成的图像。我们证明了在图像上微小的扰动会对跨注意层产生重大影响,从而在自定义扩散模型微调过程中改变文本与图像之间的映射。大量实验证明,CAAT与各种扩散模型兼容,并且在更有效(更多噪声)和更高效(是Anti-DreamBooth和Mist的两倍快)的方式上优于基线攻击方法。
URL
https://arxiv.org/abs/2404.15081