Abstract
The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.
Abstract (translated)
深度神经网络对对抗性补丁的脆弱性激起了许多提高模型鲁棒性的防御策略。然而,现有的防御方法依赖于单个观察或预先确定的对抗性信息来对抗对抗性补丁,往往无法应对未见或自适应的对抗性攻击,并且在动态三维环境中表现出不令人满意的性能。受到人类主动感知和递归反馈机制的启发,我们开发了Embodied Active Defense(EAD),一种主动的防御策略,它积极地上下文化环境信息来解决三维现实场景中的错位对抗性补丁。为了实现这一目标,EAD开发了两个核心的循环子模块,即感知模块和策略模块,以实现主动视觉的两个关键功能。这些模型通过循环处理一系列的信念和观察,促进对目标对象的深入了解,并能够开发出针对三维环境中对抗性补丁的战略性行动。为了优化学习效率,我们引入了一种不同的环境动态的有条件近似,并部署对攻击策略无依赖的补丁。大量实验证明,EAD通过其动作策略在安全关键任务(如面部识别和物体检测)中显著增强了鲁棒性,而不会牺牲标准准确性。此外,由于攻击无关的特点,EAD有助于将未见攻击引导到很高的泛化水平,将未见攻击的平均成功率降低95%。
URL
https://arxiv.org/abs/2404.00540