Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples

Abstract
Abstract (translated)
URL
PDF

Abstract

Image compression-based approaches for defending against the adversarial-example attacks, which threaten the safety use of deep neural networks (DNN), have been investigated recently. However, prior works mainly rely on directly tuning parameters like compression rate, to blindly reduce image features, thereby lacking guarantee on both defense efficiency (i.e. accuracy of polluted images) and classification accuracy of benign images, after applying defense methods. To overcome these limitations, we propose a JPEG-based defensive compression framework, namely "feature distillation", to effectively rectify adversarial examples without impacting classification accuracy on benign data. Our framework significantly escalates the defense efficiency with marginal accuracy reduction using a two-step method: First, we maximize malicious features filtering of adversarial input perturbations by developing defensive quantization in frequency domain of JPEG compression or decompression, guided by a semi-analytical method; Second, we suppress the distortions of benign features to restore classification accuracy through a DNN-oriented quantization refine process. Our experimental results show that proposed "feature distillation" can significantly surpass the latest input-transformation based mitigations such as Quilting and TV Minimization in three aspects, including defense efficiency (improve classification accuracy from $\sim20\%$ to $\sim90\%$ on adversarial examples), accuracy of benign images after defense ($\le1\%$ accuracy degradation), and processing time per image ($\sim259\times$ Speedup). Moreover, our solution can also provide the best defense efficiency ($\sim60\%$ accuracy) against the recent adaptive attack with least accuracy reduction ($\sim1\%$) on benign images when compared with other input-transformation based defense methods.

Abstract (translated)

基于图像压缩的防御深部神经网络（DNN）攻击的方法是近年来研究的热点。然而，以往的工作主要依靠对压缩率等参数的直接调整，盲目地降低图像的特征，从而在采用防御方法后，既不能保证防御效率（即被污染图像的精度）又不能保证良性图像的分类精度。为了克服这些局限性，我们提出了一种基于jpeg的防御压缩框架，即“特征提取”，以在不影响良性数据分类精度的情况下，有效地纠正对抗性示例。我们的框架使用两步方法显著提高了防御效率，降低了边际精度：首先，我们通过在jpeg压缩或解压缩的频域中开发防御量化，在半解析方法的指导下，最大限度地提高了对敌方输入扰动的恶意特征过滤；其次，我们支持ESS通过面向DNN的量化细化过程，对良性特征进行畸变，以恢复分类精度。实验结果表明，所提出的“特征提取”方法在防御效率（敌方实例分类精度由$sim20\%$提高到$sim90\%$s）、数据处理后的良性图像精度等三个方面均能显著优于最新的基于输入变换的缓解措施，如缝合和电视最小化。efense（$le1\%$accuracy degrade）和每个图像的处理时间（$sim259 imes$speedup）。此外，与其他基于输入变换的防御方法相比，我们的解决方案还可以提供针对最近自适应攻击的最佳防御效率（$sim60\%$accuracy），在良性图像上的精度降低最小（$sim1\%$）。

URL

https://arxiv.org/abs/1803.05787

PDF

https://arxiv.org/pdf/1803.05787.pdf