Abstract
Existing defense methods against adversarial attacks can be categorized into training time and test time defenses. Training time defense, i.e., adversarial training, requires a significant amount of extra time for training and is often not able to be generalized to unseen attacks. On the other hand, test time defense by test time weight adaptation requires access to perform gradient descent on (part of) the model weights, which could be infeasible for models with frozen weights. To address these challenges, we propose DRAM, a novel defense method to Detect and Reconstruct multiple types of Adversarial attacks via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a KS-test to detect adversarial attacks. Moreover, the MAE losses can be used to repair adversarial samples from unseen attack types. In this sense, DRAM neither requires model weight updates in test time nor augments the training set with more adversarial samples. Evaluating DRAM on the large-scale ImageNet data, we achieve the best detection rate of 82% on average on eight types of adversarial attacks compared with other detection baselines. For reconstruction, DRAM improves the robust accuracy by 6% ~ 41% for Standard ResNet50 and 3% ~ 8% for Robust ResNet50 compared with other self-supervision tasks, such as rotation prediction and contrastive learning.
Abstract (translated)
现有的防反欺诈方法可以分为两种:训练时间和测试时间防御。训练时间防御(也称为反欺诈训练)需要额外的训练时间,并且通常无法应用于未观察到的攻击。测试时间防御(也称为测试时间权重适应)需要访问模型权重的一部分进行梯度下降,这对于具有冻结权重的模型来说是可行的。为了解决这些挑战,我们提出了DRAM,一种 novel 防御方法,通过掩码自编码器(MAE)来检测和重构多种类型的反欺诈攻击。我们展示了如何使用MAE损失构建 KS-测试来检测反欺诈攻击。此外,MAE损失还可以用于修复未观察到攻击类型的反欺诈样本。因此,DRAM在测试时间内不需要模型权重更新,也不会增加训练集中的更多反欺诈样本。在评估DRAM的大型图像集数据上,我们平均实现了82%的反欺诈攻击检测率,相比其他检测基准线。对于重构,DRAM标准ResNet50的鲁棒精度提高了6%至41%,而 robust ResNet50的精度提高了3%至8%。与旋转预测和对比学习等其他自监督任务相比,DRAM实现了更好的鲁棒性精度。
URL
https://arxiv.org/abs/2303.12848