Abstract
LLM-driven Anomaly Detection (AD) helps enhance the understanding and explanatory abilities of anomalous behaviors in Time Series (TS). Existing methods face challenges of inadequate reasoning ability, deficient multi-turn dialogue capability, and narrow generalization. To this end, we 1) propose a multi-agent-based TS Evolution algorithm named TSEvol. On top of it, we 2) introduce the AD reasoning and multi-turn dialogue Dataset TSEData-20K and contribute the Chatbot family for AD, including ChatAD-Llama3-8B, Qwen2.5-7B, and Mistral-7B. Furthermore, 3) we propose the TS Kahneman-Tversky Optimization (TKTO) to enhance ChatAD's cross-task generalization capability. Lastly, 4) we propose a LLM-driven Learning-based AD Benchmark LLADBench to evaluate the performance of ChatAD and nine baselines across seven datasets and tasks. Our three ChatAD models achieve substantial gains, up to 34.50% in accuracy, 34.71% in F1, and a 37.42% reduction in false positives. Besides, via KTKO, our optimized ChatAD achieves competitive performance in reasoning and cross-task generalization on classification, forecasting, and imputation.
Abstract (translated)
LLM驱动的异常检测(AD)有助于增强对时间序列(TS)中异常行为的理解和解释能力。现有的方法面临着推理能力不足、多轮对话能力欠缺以及泛化范围狭窄的问题。为此,我们提出了以下解决方案: 1) 我们提出了一种基于多代理的时间序列演化算法,命名为TSEvol。 2) 在此基础上,我们引入了异常检测推理及多轮对话数据集TSEData-20K,并贡献了一系列用于异常检测的聊天机器人家族,包括ChatAD-Llama3-8B、Qwen2.5-7B和Mistral-7B。 3) 此外,我们提出了时间序列卡恩曼-特沃斯基优化(TS Kahneman-Tversky Optimization, TKTO),以增强ChatAD在跨任务泛化能力上的表现。 4) 最后,我们提出了一种基于LLM的学习型异常检测基准测试LLADBench,用于评估ChatAD及其九个基线模型在七个数据集和任务中的性能。 我们的三个ChatAD模型取得了显著的改进,在准确性方面提高了34.50%,F1得分提升了34.71%,同时将误报率降低了37.42%。此外,通过TKTO优化后的ChatAD在分类、预测及插值等任务上的推理能力和跨任务泛化能力均表现出竞争性水平。
URL
https://arxiv.org/abs/2601.13546