Abstract
Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs. To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.
Abstract (translated)
基于深度学习的恶意分类器由于概念漂移而面临着显著的挑战。恶意软件的快速演变,特别是新家族的出现,可能会使分类准确性降低至近似随机的水平。之前的研究主要集中在检测漂移样本,依赖于专家主导的分析和模型重新训练。然而,这些方法往往缺乏对恶意软件概念的全面理解,并为有效的漂移适应提供有限指导,导致不稳定的检测性能和高的人类标注成本。为了克服这些限制,我们引入了DREAM,一种旨在超越现有漂移检测器的全新系统,以建立解释性漂移适应过程。DREAM通过模型的敏感性和数据自主性增强漂移检测。训练在半监督方法上的探测器通过分类器反馈主动捕捉恶意行为概念。在测试过程中,它利用探测器自己生成的样本,消除了对广泛训练数据的依赖。对于漂移适应,DREAM扩大了人类干预,使得对探测器隐含空间中包含的 malware 标签和概念解释进行修订。为了确保对概念漂移的全面响应,它促进了分类器和探测器之间的协同更新过程。我们的评估显示,DREAM可以有效地提高漂移检测精度,并在不同恶意软件数据集和分类器上减少专家分析工作量。
URL
https://arxiv.org/abs/2405.04095