Abstract
Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
Abstract (translated)
对抗攻击的目标是通过在输入样本中添加特定的噪声来干扰目标系统的功能和,当应用于人脸识别系统时,可能带来安全和鲁棒性的潜在威胁。尽管现有的防御技术能够在一些特定的对抗 Faces(adv-faces) 的检测方面实现高精度(adv-faces),但新的攻击方法特别是基于GAN的攻击方法,具有完全不同的噪声模式,绕过了这些防御技术并实现了更高的攻击成功率。更加糟糕的是,现有技术需要在实施防御之前收集攻击数据,因此无法有效地防御那些对防御者未知的新攻击。在本文中,我们研究了adv-faces 的固有一般性,并提出了通过对真实人脸进行三个启发式的噪声模式扰动来生成伪adv-faces 的方法。我们是第一位使用仅真实人脸及其自扰训练 an adv-face 检测器的人,并对受害者人脸识别系统及未知的攻击进行gnostic。将adv-faces 视为非分布数据,因此我们自然地引入了一个 novel 的级联系统以进行adv-face 检测,它包括训练数据自扰、决策边界 Regularization 和基于最大池化的分类器,专注于异常局部颜色变异。在 LFW 和CelebA-HQ 数据集上,使用八项梯度攻击和两个 GAN 攻击进行实验,证明了我们的方法可以适应各种不同的未知对抗攻击。
URL
https://arxiv.org/abs/2304.11359