Paper Reading AI Learner

Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models

2024-04-16 07:19:52
Qi Guo, Shanmin Pang, Xiaojun Jia, Qing Guo

Abstract

Targeted transfer-based attacks involving adversarial examples pose a significant threat to large visual-language models (VLMs). However, the state-of-the-art (SOTA) transfer-based attacks incur high costs due to excessive iteration counts. Furthermore, the generated adversarial examples exhibit pronounced adversarial noise and demonstrate limited efficacy in evading defense methods such as DiffPure. To address these issues, inspired by score matching, we introduce AdvDiffVLM, which utilizes diffusion models to generate natural, unrestricted adversarial examples. Specifically, AdvDiffVLM employs Adaptive Ensemble Gradient Estimation to modify the score during the diffusion model's reverse generation process, ensuring the adversarial examples produced contain natural adversarial semantics and thus possess enhanced transferability. Simultaneously, to enhance the quality of adversarial examples further, we employ the GradCAM-guided Mask method to disperse adversarial semantics throughout the image, rather than concentrating them in a specific area. Experimental results demonstrate that our method achieves a speedup ranging from 10X to 30X compared to existing transfer-based attack methods, while maintaining superior quality of adversarial examples. Additionally, the generated adversarial examples possess strong transferability and exhibit increased robustness against adversarial defense methods. Notably, AdvDiffVLM can successfully attack commercial VLMs, including GPT-4V, in a black-box manner.

Abstract (translated)

针对具有对抗性样本的定向转移攻击对大型视觉语言模型(VLMs)构成了重大威胁。然而,最先进的(SOTA)转移攻击由于迭代计数过多而产生了高昂的成本。此外,生成的对抗性样本表现出明显的对抗性噪声,并表明在躲避防御方法如DiffPure等情况下效果有限。为了应对这些问题,受到评分匹配的启发,我们引入了AdvDiffVLM,它利用扩散模型生成自然、无限制的对抗性样本。具体来说,AdvDiffVLM在扩散模型反向生成过程中采用自适应集成梯度估计来修改得分,确保生成的对抗性样本包含自然对抗性语义,从而具有增强的转移性。同时,为了进一步提高对抗性样本的质量,我们采用GradCAM指导的遮罩方法将对抗性语义分散在整个图像中,而不仅仅集中在一个特定区域。实验结果表明,与现有的转移攻击方法相比,我们的方法实现了从10倍到30倍的加速,同时保持了优越的对抗性样本质量。此外,生成的对抗性样本具有很强的转移性,对防御方法具有较强的抵抗力。值得注意的是,AdvDiffVLM可以在黑盒方式下成功攻击商业VLMs,包括GPT-4V。

URL

https://arxiv.org/abs/2404.10335

PDF

https://arxiv.org/pdf/2404.10335.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot