Abstract
Anomaly detection is critical for finding suspicious behavior in innumerable systems. We need to detect anomalies in real-time, i.e. determine if an incoming entity is anomalous or not, as soon as we receive it, to minimize the effects of malicious activities and start recovery as soon as possible. Therefore, online algorithms that can detect anomalies in a streaming manner are essential. We first propose MIDAS which uses a count-min sketch to detect anomalous edges in dynamic graphs in an online manner, using constant time and memory. We then propose two variants, MIDAS-R which incorporates temporal and spatial relations, and MIDAS-F which aims to filter away anomalous edges to prevent them from negatively affecting the internal data structures. We then extend the count-min sketch to a Higher-Order sketch to capture complex relations in graph data, and to reduce detecting suspicious dense subgraph problem to finding a dense submatrix in constant time. Using this sketch, we propose four streaming methods to detect edge and subgraph anomalies. Next, we broaden the graph setting to multi-aspect data. We propose MStream which detects explainable anomalies in multi-aspect data streams. We further propose MStream-PCA, MStream-IB, and MStream-AE to incorporate correlation between features. Finally, we consider multi-dimensional data streams with concept drift and propose MemStream. MemStream leverages the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove a theoretical bound on the size of memory for effective drift handling. In addition, we allow quick retraining when the arriving stream becomes sufficiently different from the training data. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning.
Abstract (translated)
异常检测对于发现数量庞大的系统中的可疑行为至关重要。我们需要在实时的情况下检测异常,即一旦收到一个输入实体,就确定它是否存在异常,以最大限度地减少恶意活动的影响,并尽快开始恢复。因此,必须存在能够以流方式检测异常的在线算法。我们首先提出了MIDAS,该算法使用计数最小 sketch 在动态图中检测异常边,同时使用恒定时间和内存。我们提出了两个变体:MIDAS-R 包括时间空间关系,MIDAS-F 旨在过滤掉异常边,防止它们对内部数据结构产生负面影响。我们将计数最小 sketch 扩展为高阶 sketch,以捕捉图数据中的复杂关系,并降低检测可疑密集子图的问题,将其降低到恒定时间中找到密集子矩阵。使用这个 sketch 我们可以提出四个流方法来检测边和子异常。接下来,我们将 graph 设置扩展到多个维度。我们提出了 MStream,该算法在多个维度数据流中检测可解释的异常。我们进一步提出了 MStream-PCA、MStream-IB 和 MStream-AE,以整合特征之间的相关性。最后,我们考虑具有概念漂移的数据流,并提出了 MemStream。 MemStream 利用去噪自编码器的功率来学习表示和内存模块来学习数据的动态变化趋势,而无需标签。我们证明了内存处理的有效边界大小。此外,当输入流与训练数据足够不同时,我们允许快速重新训练。此外,MemStream 使用了两个架构设计选择,以鲁棒地应对内存中毒。
URL
https://arxiv.org/abs/2301.13199