A Direct Approach to Robust Deep Learning Using Adversarial Networks

Abstract
Abstract (translated)
URL
PDF

Abstract

Deep neural networks have been shown to perform well in many classical machine learning problems, especially in image classification tasks. However, researchers have found that neural networks can be easily fooled, and they are surprisingly sensitive to small perturbations imperceptible to humans. Carefully crafted input images (adversarial examples) can force a well-trained neural network to provide arbitrary outputs. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. In this paper we propose a new defensive mechanism under the generative adversarial network (GAN) framework. We model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game. We show empirically that our adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarial training with projected gradient descent.

Abstract (translated)

深层神经网络在许多经典机器学习问题中表现良好，尤其是在图像分类任务中。然而，研究人员发现，神经网络很容易被愚弄，而且它们对人类无法察觉的小扰动非常敏感。精心制作的输入图像（敌对的例子）可以迫使训练有素的神经网络提供任意输出。在训练中包括对抗性的例子是一种常见的对抗性攻击的防御机制。本文在生成对抗网络（gan）框架下提出了一种新的防御机制。我们使用一个生成网络，与一个分类识别网络一起训练，作为一个极大极小博弈来模拟对抗性噪声。我们根据经验表明，我们的对抗性网络方法很好地抵御黑匣子攻击，其性能与最先进的方法相当，如整体对抗性训练和预计梯度下降的对抗性训练。

URL

https://arxiv.org/abs/1905.09591

PDF

https://arxiv.org/pdf/1905.09591.pdf