Paper Reading AI Learner

Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation

2024-03-13 06:57:23
Tianyi Chu, Wei Xing, Jiafu Chen, Zhizhong Wang, Jiakai Sun, Lei Zhao, Haibo Chen, Huaizhong Lin

Abstract

Existing generative adversarial network (GAN) based conditional image generative models typically produce fixed output for the same conditional input, which is unreasonable for highly subjective tasks, such as large-mask image inpainting or style transfer. On the other hand, GAN-based diverse image generative methods require retraining/fine-tuning the network or designing complex noise injection functions, which is computationally expensive, task-specific, or struggle to generate high-quality results. Given that many deterministic conditional image generative models have been able to produce high-quality yet fixed results, we raise an intriguing question: is it possible for pre-trained deterministic conditional image generative models to generate diverse results without changing network structures or parameters? To answer this question, we re-examine the conditional image generation tasks from the perspective of adversarial attack and propose a simple and efficient plug-in projected gradient descent (PGD) like method for diverse and controllable image generation. The key idea is attacking the pre-trained deterministic generative models by adding a micro perturbation to the input condition. In this way, diverse results can be generated without any adjustment of network structures or fine-tuning of the pre-trained models. In addition, we can also control the diverse results to be generated by specifying the attack direction according to a reference text or image. Our work opens the door to applying adversarial attack to low-level vision tasks, and experiments on various conditional image generation tasks demonstrate the effectiveness and superiority of the proposed method.

Abstract (translated)

现有的基于条件图像生成模型的生成对抗网络(GAN)通常会对相同的条件输入产生固定的输出,这对于高度主观的任务(如大规模掩码图像修复或风格迁移)来说是不合理的。另一方面,基于GAN的多样图像生成方法需要重新训练或微调网络或设计复杂的噪声注入函数,这会导致计算开销、任务特定或很难生成高质量结果。鉴于许多确定性条件图像生成模型已经能够产生高质量但固定的结果,我们提出了一个有趣的问题:是否可以在不改变网络结构或参数的情况下,使预训练的确定性条件图像生成模型产生多样化的结果?为了回答这个问题,我们重新审视了条件图像生成任务,从攻击者的角度出发,提出了一种简单而有效的插值平滑梯度下降(PGD)类似方法,用于多样且可控制图像生成。关键思想是对输入条件添加一个微小的扰动。这样,就可以在不调整网络结构或对预训练模型进行微调的情况下生成多样化的结果。此外,我们还可以根据参考文本或图像指定攻击方向,从而控制生成的多样结果。我们的工作为将对抗攻击应用于低级视觉任务打开了大门,而各种条件图像生成任务的实验结果也证明了所提出方法的有效性和优越性。

URL

https://arxiv.org/abs/2403.08294

PDF

https://arxiv.org/pdf/2403.08294.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot