Paper Reading AI Learner

Black-box Backdoor Defense via Zero-shot Image Purification

2023-03-21 20:21:44
Yucheng Shi, Mengnan Du, Xuansheng Wu, Zihan Guan, Ninghao Liu

Abstract

Backdoor attacks inject poisoned data into the training set, resulting in misclassification of the poisoned samples during model inference. Defending against such attacks is challenging, especially in real-world black-box settings where only model predictions are available. In this paper, we propose a novel backdoor defense framework that can effectively defend against various attacks through zero-shot image purification (ZIP). Our proposed framework can be applied to black-box models without requiring any internal information about the poisoned model or any prior knowledge of the clean/poisoned samples. Our defense framework involves a two-step process. First, we apply a linear transformation on the poisoned image to destroy the trigger pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process using the transformed image to guide the generation of high-fidelity purified images, which can be applied in zero-shot settings. We evaluate our ZIP backdoor defense framework on multiple datasets with different kinds of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models.

Abstract (translated)

后门攻击将有毒数据注入训练集,导致模型推理时误分类有毒样本。对这种攻击的防御非常困难,特别是在只有模型预测可用的真实黑盒环境中。在本文中,我们提出了一种新的后门防御框架,可以通过零次采样图像净化(ZIP)有效地防御各种攻击。该框架可以应用于黑盒模型,而无需有关毒模型或干净/有毒样本的任何内部信息。我们的防御框架涉及两个步骤。首先,我们对有毒图像进行线性变换,摧毁触发模式。然后,我们使用预先训练的扩散模型恢复被变换掉的语义信息。特别是,我们设计了一个新的逆过程,使用变换图像来指导生成高保真度净化图像,该过程可以在零次采样环境中应用。我们对各种不同类型的攻击 multiple datasets 进行了多个数据集的评估。实验结果显示,我们的 ZIP 框架相对于最先进的后门防御基线更加优秀。我们相信,我们的结果将为黑盒模型的未来防御方法提供有价值的洞察。

URL

https://arxiv.org/abs/2303.12175

PDF

https://arxiv.org/pdf/2303.12175.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot