Interpretable Outlier Summarization

Abstract
Abstract (translated)
URL
PDF

Abstract

Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures. To reduce the human effort in evaluating outlier detection results and effectively turn the outliers into actionable insights, the users often expect a system to automatically produce interpretable summarizations of subgroups of outlier detection results. Unfortunately, to date no such systems exist. To fill this gap, we propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results. Rather than use the classical decision tree algorithms to produce these rules, STAIR proposes a new optimization objective to produce a small number of rules with least complexity, hence strong interpretability, to accurately summarize the detection results. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets which are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate, compared to the decision tree methods.

Abstract (translated)

异常检测在实际应用中至关重要,可以防止金融欺诈、保护网络入侵或检测即将发生的设备故障。为了减少评估异常检测结果的人类工作量,并将异常点转化为可解释的见解,用户通常期望系统自动产生解释性摘要分组内异常检测结果的详细描述。不幸的是,目前不存在这样的系统。为了填补这一空缺,我们提出了STAIR,它学习一组易于理解的简单规则来摘要和解释异常检测结果。而不是使用经典的决策树算法来生成这些规则,STAIR提出了一个新的优化目标,以产生最少的复杂度、因此具有更强的解释性的规则,以准确地摘要异常检测结果。STAIR的学习算法通过迭代分割大型规则来生成规则集,并在每个迭代中最大化该目标的最佳。此外,为了有效处理难以用简单规则概括的高维度、高度复杂的数据集,我们提出了一种局部的STAIR方法,称为L-STAIR。考虑到数据局部性,它同时分区数据,并为每个分区学习一组局部规则。我们的许多异常基准数据集的实验研究表明,STAIR significantly reduce the complexity of the rules required to summarize the异常检测结果,因此更容易让人类理解和评估,相比决策树方法。

URL

https://arxiv.org/abs/2303.06261

PDF

https://arxiv.org/pdf/2303.06261.pdf