Abstract
In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.
Abstract (translated)
在人工智能信息技术运营领域,因果发现对于图形构建的操作和维护至关重要,这有助于下游工业任务的根因分析,如原因分析。 temporal因果发现作为一种新兴的方法,旨在通过利用干预数据直接识别出变量的因果关系,从而实现对真实世界系统中隐含文本信息的发现。然而,现有的方法主要集中在依赖干预目标的合成数据集上,忽视了真实世界系统中隐含的文本信息,因此无法对真实工业场景进行因果发现。为了解决这个问题,本文提出了一种研究工业场景中因果发现的方案,面临着两个关键挑战:1)如何在不花费实践成本的干预目标之间发现因果关系,2)如何通过利用复杂但丰富的工业环境中系统的文本信息来发现因果关系。为解决这些挑战,我们提出了 RealTCD 框架,它能够利用领域知识来发现没有干预目标时的因果关系。具体来说,我们首先开发了一种基于分数的时序因果发现方法,通过战略遮蔽和正则化能够发现根原因分析中的因果关系。此外,通过使用大型语言模型(LLMs)处理文本并整合领域知识,我们引入了 LLM-guided meta-initialization,以提取系统中的文本信息以提高发现质量。我们在模拟和真实世界数据集上进行广泛的实验,证明了我们提出的 RealTCD 框架在发现时序因果结构方面优于现有基线。
URL
https://arxiv.org/abs/2404.14786