Abstract
Despite their tremendous success in modelling high-dimensional data manifolds, deep neural networks suffer from the threat of adversarial attacks - Existence of perceptually valid input-like samples obtained through careful perturbations that leads to degradation in the performance of underlying model. Major concerns with existing defense mechanisms include non-generalizability across different attacks, models and large inference time. In this paper, we propose a generalized defense mechanism capitalizing on the expressive power of regularized latent space based generative models. We design an adversarial filter, devoid of access to classifier and adversaries, which makes it usable in tandem with any classifier. The basic idea is to learn a Lipschitz constrained mapping from the data manifold, incorporating adversarial perturbations, to a quantized latent space and re-map it to the true data manifold. Specifically, we simultaneously auto-encode the data manifold and its perturbations implicitly through the perturbations of the regularized and quantized generative latent space, realized using variational inference. We demonstrate the efficacy of the proposed formulation in providing the resilience against multiple attack types (Black and white box) and methods, while being almost real-time. Our experiments show that the proposed method surpasses the state-of-the-art techniques in several cases.
Abstract (translated)
尽管深部神经网络在模拟高维数据流形方面取得了巨大的成功,但它仍面临着对抗性攻击的威胁——存在通过小心的扰动获得的感知有效输入样样本,这会导致底层模型的性能下降。现有防御机制的主要关注点包括不同攻击、模型和大量推理时间的不可概括性。本文提出了一种基于规则化潜空间生成模型表达能力的广义防御机制。我们设计了一个对抗性的过滤器,不需要访问分类器和对手,这使得它可以与任何分类器一起使用。其基本思想是从数据流形学习一个李普希茨约束映射,将对抗性扰动合并到一个量化的潜在空间,并将其重新映射到真正的数据流形。具体地说,我们同时通过正则化和量化生成潜空间的扰动隐式地对数据流形及其扰动进行自动编码,利用变分推理实现。我们证明了所建议的配方在提供对多种攻击类型(黑盒和白盒)和方法的恢复能力的同时几乎是实时的。实验表明,该方法在某些情况下优于最新技术。
URL
https://arxiv.org/abs/1903.09940