Paper Reading AI Learner

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE

2023-05-24 08:59:25
Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen

Abstract

Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger.

Abstract (translated)

语言模型经常面临多种后缀攻击的风险,特别是数据中毒。因此,研究防御解决方案是非常必要的。现有的后缀防御方法主要关注具有明确触发器的后缀攻击,而忽略了多种不同类型的后缀攻击,即各种不同类型的后缀攻击的通用防御方法 largely unexplored。在本文中,我们提出了一种基于整体集成的后缀防御框架,称为 DPoE (Denoised Product-of- Experts),它受后缀攻击的快捷性启发,以保护各种后缀攻击。DPoE 由两个模型组成:一个浅层的模型,用于捕获后缀快捷,一个主要的模型,以防止学习后缀快捷。为了应对后缀攻击者造成的标签翻转,DPoE 采用了去噪设计。对 SST-2 数据集的实验表明,DPoE 显著改进了对抗各种类型后缀触发器,包括词级、句子级和语法触发器的攻击性能。此外,DPoE 在混合多种触发器的更困难但实用的场景中也有效。

URL

https://arxiv.org/abs/2305.14910

PDF

https://arxiv.org/pdf/2305.14910.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot