Abstract
ML-based malware detection on dynamic analysis reports is vulnerable to both evasion and spurious correlations. In this work, we investigate a specific ML architecture employed in the pipeline of a widely-known commercial antivirus company, with the goal to harden it against adversarial malware. Adversarial training, the sole defensive technique that can confer empirical robustness, is not applicable out of the box in this domain, for the principal reason that gradient-based perturbations rarely map back to feasible problem-space programs. We introduce a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion. Our approach comes with multiple advantages. It performs modifications that are feasible in the problem-space, and only those; thus it circumvents the inverse mapping problem. It also makes possible to provide theoretical guarantees on the robustness of the model against a particular set of adversarial capabilities. Our empirical exploration validates our theoretical insights, where we can consistently reach 0\% Attack Success Rate after a few adversarial retraining iterations.
Abstract (translated)
基于机器学习(ML-based)的恶意软件检测在动态分析报告上存在被绕过和伪相关性的漏洞。在这项工作中,我们调查了一个在广为人知商业反病毒公司管道中使用的特定ML架构,旨在加强它对恶意软件的对抗能力。由于在这个领域中,对抗训练是唯一的防御策略,因此它不能直接应用于这个领域。我们引入了一种新颖的强化学习方法来构建对抗样本,是对抗训练构建模型的一个组成部分。我们的方法具有多个优点。它在问题空间中执行可实现修改,并且仅限于那些;因此它绕过了反向映射问题。它还使得可以在特定对抗能力上对模型进行理论保证。我们的实证探索证实了我们的理论洞察,在几次对抗重新训练后,我们可以一致地达到0%的攻击成功率。
URL
https://arxiv.org/abs/2402.19027