Abstract
Existing generative adversarial network (GAN) based conditional image generative models typically produce fixed output for the same conditional input, which is unreasonable for highly subjective tasks, such as large-mask image inpainting or style transfer. On the other hand, GAN-based diverse image generative methods require retraining/fine-tuning the network or designing complex noise injection functions, which is computationally expensive, task-specific, or struggle to generate high-quality results. Given that many deterministic conditional image generative models have been able to produce high-quality yet fixed results, we raise an intriguing question: is it possible for pre-trained deterministic conditional image generative models to generate diverse results without changing network structures or parameters? To answer this question, we re-examine the conditional image generation tasks from the perspective of adversarial attack and propose a simple and efficient plug-in projected gradient descent (PGD) like method for diverse and controllable image generation. The key idea is attacking the pre-trained deterministic generative models by adding a micro perturbation to the input condition. In this way, diverse results can be generated without any adjustment of network structures or fine-tuning of the pre-trained models. In addition, we can also control the diverse results to be generated by specifying the attack direction according to a reference text or image. Our work opens the door to applying adversarial attack to low-level vision tasks, and experiments on various conditional image generation tasks demonstrate the effectiveness and superiority of the proposed method.
Abstract (translated)
现有的基于条件图像生成模型的生成对抗网络(GAN)通常会对相同的条件输入产生固定的输出,这对于高度主观的任务(如大规模掩码图像修复或风格迁移)来说是不合理的。另一方面,基于GAN的多样图像生成方法需要重新训练或微调网络或设计复杂的噪声注入函数,这会导致计算开销、任务特定或很难生成高质量结果。鉴于许多确定性条件图像生成模型已经能够产生高质量但固定的结果,我们提出了一个有趣的问题:是否可以在不改变网络结构或参数的情况下,使预训练的确定性条件图像生成模型产生多样化的结果?为了回答这个问题,我们重新审视了条件图像生成任务,从攻击者的角度出发,提出了一种简单而有效的插值平滑梯度下降(PGD)类似方法,用于多样且可控制图像生成。关键思想是对输入条件添加一个微小的扰动。这样,就可以在不调整网络结构或对预训练模型进行微调的情况下生成多样化的结果。此外,我们还可以根据参考文本或图像指定攻击方向,从而控制生成的多样结果。我们的工作为将对抗攻击应用于低级视觉任务打开了大门,而各种条件图像生成任务的实验结果也证明了所提出方法的有效性和优越性。
URL
https://arxiv.org/abs/2403.08294