Abstract
While it is shown in the literature that simultaneously accurate and robust classifiers exist for common datasets, previous methods that improve the adversarial robustness of classifiers often manifest an accuracy-robustness trade-off. We build upon recent advancements in data-driven ``locally biased smoothing'' to develop classifiers that treat benign and adversarial test data differently. Specifically, we tailor the smoothing operation to the usage of a robust neural network as the source of robustness. We then extend the smoothing procedure to the multi-class setting and adapt an adversarial input detector into a policy network. The policy adaptively adjusts the mixture of the robust base classifier and a standard network, where the standard network is optimized for clean accuracy and is not robust in general. We provide theoretical analyses to motivate the use of the adaptive smoothing procedure, certify the robustness of the smoothed classifier under realistic assumptions, and justify the introduction of the policy network. We use various attack methods, including AutoAttack and adaptive attack, to empirically verify that the smoothed model noticeably improves the accuracy-robustness trade-off. On the CIFAR-100 dataset, our method simultaneously achieves an 80.09\% clean accuracy and a 32.94\% AutoAttacked accuracy. The code that implements adaptive smoothing is available at this https URL.
Abstract (translated)
虽然文献表明,对于常见的数据集,存在同时准确且可靠的分类器,但改进分类器dversarial robustness的方法往往表现出准确性和鲁棒性的权衡。我们基于最近在数据驱动的“局部偏斜平滑”方面的进展,开发了一种对待良性和dversarial test数据有所不同的分类器。具体来说,我们调整平滑操作以适应使用一个稳健的神经网络作为鲁棒性的源。然后我们将平滑过程扩展到多分类设置,并将dversarial输入检测器适应为政策网络。政策自适应地调整稳健的基分类器和标准网络的组合,其中标准网络为干净准确性的最优选择,而通常不具有较强的鲁棒性。我们提供了理论分析来激励使用自适应平滑操作,证明了平滑的分类器的鲁棒性在真实情况下的程度,并justify了引入政策网络。我们使用各种攻击方法,包括自动攻击和自适应攻击,以经验地验证,平滑模型明显改进了准确性和鲁棒性的权衡。在CIFAR-100数据集上,我们的方法同时实现了80.09%的干净准确性和32.94%的自动攻击准确性。实现自适应平滑的代码在此httpsURL上可用。
URL
https://arxiv.org/abs/2301.12554