Abstract
Prompt-based learning paradigm has demonstrated remarkable efficacy in enhancing the adaptability of pretrained language models (PLMs), particularly in few-shot scenarios. However, this learning paradigm has been shown to be vulnerable to backdoor attacks. The current clean-label attack, employing a specific prompt as a trigger, can achieve success without the need for external triggers and ensure correct labeling of poisoned samples, which is more stealthy compared to the poisoned-label attack, but on the other hand, it faces significant issues with false activations and poses greater challenges, necessitating a higher rate of poisoning. Using conventional negative data augmentation methods, we discovered that it is challenging to trade off between effectiveness and stealthiness in a clean-label setting. In addressing this issue, we are inspired by the notion that a backdoor acts as a shortcut and posit that this shortcut stems from the contrast between the trigger and the data utilized for poisoning. In this study, we propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features. With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates. Notably, we found that the two approaches play leading roles in full-shot and few-shot settings, respectively.
Abstract (translated)
基于提示的学习范式在增强预训练语言模型(PLMs)的适应性方面表现出了显著的效果,特别是在少样本场景中。然而,这种学习范式已经被证明容易受到后门攻击。当前的干净标签攻击通过使用特定的提示作为触发器,可以在不需要外部触发器的情况下实现成功,并确保正确标注的有毒样本,这比毒标签攻击更加隐秘,但另一方面,它面临假激活的问题,构成更大的挑战,需要更高的毒性率。通过传统的负数据增强方法,我们发现在一个干净标签设置中,有效性和隐秘性之间的平衡是困难的。为解决这一问题,我们受到了灵感来自于后门作为一个快捷方式的想法,并认为这一快捷方式源于触发器和用于毒化的数据之间的差异。在这项研究中,我们提出了名为 Contrastive Shortcut Injection(CSI)的方法,通过利用激活值,将触发器设计和数据选择策略集成在一起,构建出更强的快捷特征。在全面检测和少量文本分类任务的广泛实验中,我们通过经验验证 CSI 在低毒性率下的高效性和隐秘性。值得注意的是,我们发现两种方法在全面检测和少量文本分类场景中发挥着关键作用。
URL
https://arxiv.org/abs/2404.00461