Abstract
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid formal definitions of concepts and their operationalization, it can be challenging to establish relevant concept datasets. Here, we address this challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet) for comprehensive concept definition and present a workflow for user-driven data collection in both text and image domains. The concepts derived from knowledge graphs are defined interactively, providing an opportunity for personalization and ensuring that the concepts reflect the user's intentions. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs based on these empirical concept datasets provide robust and accurate explanations. Importantly, we also find good alignment between the models' representations of concepts and the structure of knowledge graphs, i.e., human representations. This supports our conclusion that knowledge graph-based concepts are relevant for XAI.
Abstract (translated)
基于概念的合理解释AI作为提高特定用户基于模型的理解的有前途的工具,例如作为个性化的合理解释工具。一类基于概念的合理解释方法是通过经验定义的概念,通过一系列正面和负面例子间接定义,如TCAV方法(Kim et al., 2018)构建的。虽然用户希望避免概念及其操作的正式定义,但建立相关概念数据集仍然具有挑战性。在这里,我们通过综合知识图(如Wikidata或WordNet)进行全面的 concepts 定义,并呈现了在文本和图像领域中用户驱动数据收集的工作流程。从知识图中获得的 concepts 是交互式定义的,为个性化提供了机会,并确保概念反映了用户的意图。我们在两个概念基于 explainability 方法上测试检索到的概念数据集:概念激活矢量(CAVs)和概念激活区域(CARs)(Crabbe 和 van der Schaar, 2022)。我们证明了基于这些经验概念数据的 CAVs 和 CARs 提供了一种可靠且准确的解释。重要的是,我们还发现模型对概念的表示与知识图的结构之间存在良好的对应关系,即人机表示。这支持了我们关于知识图概念对于 XAI 的结论。
URL
https://arxiv.org/abs/2404.07008