Abstract
Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines.
Abstract (translated)
指导图像合成方法,如基于扩散模型的SDEdit,在从用户输入的绘笔画创建逼真的图像方面表现出色。然而,现有努力主要关注图像质量,往往忽视了一个关键点:扩散模型表示数据分布,而不是单个图像。这导致生成图像与用户意图相矛盾的可能性较低,但存在伦理问题。例如,用户输入具有女性特征的绘笔画,在SDEdit中,有一定概率会生成具有男性特征的图像。为了揭示这个潜在的安全漏洞,我们旨在建立一个对抗攻击,迫使SDEdit生成与指定属性(例如女性)相符的特定数据分布,同时不改变输入的属性特征。我们提出了Targeted Attribute Generative Attack(TAGA),使用具有属性的目标函数和优化输入绘笔画的对抗噪声。实验研究表明,传统的对抗噪声很难与TAGA相比,而自然扰动(例如曝光和模糊)很容易改变生成的图像的属性。为了有效地执行攻击,我们引入了FoolSDEdit:我们设计了一个联合对抗曝光和模糊攻击,将曝光和模糊添加到绘笔画中,并一起优化它们。我们优化了各种扰动的执行策略,将其封装为网络架构搜索问题。我们创建了SuperPert,表示不同扰动执行策略的图形。在训练之后,我们获得了有效TAGA对SDEdit的优化执行策略。在两个数据集上的全面实验表明,我们的方法使SDEdit生成了针对属性的目标数据分布,显著优于基线。
URL
https://arxiv.org/abs/2402.03705