In the field of Question Answering (QA), unifying large language models (LLMs) with external databases has shown great success. However, these methods often fall short in providing the advanced reasoning needed for complex QA tasks. To address these issues, we improve over a novel approach called Knowledge Graph Prompting (KGP), which combines knowledge graphs with a LLM-based agent to improve reasoning and search accuracy. Nevertheless, the original KGP framework necessitates costly fine-tuning with large datasets yet still suffers from LLM hallucination. Therefore, we propose a reasoning-infused LLM agent to enhance this framework. This agent mimics human curiosity to ask follow-up questions to more efficiently navigate the search. This simple modification significantly boosts the LLM performance in QA tasks without the high costs and latency associated with the initial KGP framework. Our ultimate goal is to further develop this approach, leading to more accurate, faster, and cost-effective solutions in the QA domain.
在问题回答(QA)领域,将大型语言模型(LLMs)与外部数据库统一的做法取得了巨大的成功。然而,这些方法在提供复杂QA任务所需的高级推理方面常常不足。为解决这些问题,我们改进了一种名为知识图谱提示(KGP)的新方法,该方法将知识图谱与基于LLM的代理相结合以提高推理和搜索精度。然而,原始KGP框架需要对大量数据进行昂贵的微调,但仍存在LLM幻觉的问题。因此,我们提出了一个基于推理的LLM代理以增强这一框架。这个代理模仿人类的好奇心,以更有效地引导搜索。这样的简单修改在不需要初始KGP框架的高昂成本和延迟的情况下显著提高了LLM在QA任务中的性能。我们的最终目标是进一步发展这种方法,为QA领域提供更准确、更快、更经济有效的解决方案。
https://arxiv.org/abs/2404.09077
The Knowledge Graph Entity Typing (KGET) task aims to predict missing type annotations for entities in knowledge graphs. Recent works only utilize the \textit{\textbf{structural knowledge}} in the local neighborhood of entities, disregarding \textit{\textbf{semantic knowledge}} in the textual representations of entities, relations, and types that are also crucial for type inference. Additionally, we observe that the interaction between semantic and structural knowledge can be utilized to address the false-negative problem. In this paper, we propose a novel \textbf{\underline{S}}emantic and \textbf{\underline{S}}tructure-aware KG \textbf{\underline{E}}ntity \textbf{\underline{T}}yping~{(SSET)} framework, which is composed of three modules. First, the \textit{Semantic Knowledge Encoding} module encodes factual knowledge in the KG with a Masked Entity Typing task. Then, the \textit{Structural Knowledge Aggregation} module aggregates knowledge from the multi-hop neighborhood of entities to infer missing types. Finally, the \textit{Unsupervised Type Re-ranking} module utilizes the inference results from the two models above to generate type predictions that are robust to false-negative samples. Extensive experiments show that SSET significantly outperforms existing state-of-the-art methods.
知识图实体类型标注(KGET)任务的目的是预测知识图中实体的缺失类型标注。最近的工作仅利用实体局部邻域中的结构化知识,而忽略了文本表示中实体、关系和类型也至关重要用于类型推理的语义知识。此外,我们还观察到语义和结构化知识的相互作用可以用于解决假阴性问题。在本文中,我们提出了一个新颖的语义和结构感知的知识图实体类型标注(SSET)框架,它由三个模块组成。首先,\textit{语义知识编码}模块通过遮罩实体类型标注任务对知识图进行语义化知识编码。然后,\textit{结构化知识聚合}模块将来自实体多级邻域的知识进行聚合,以推断缺失类型。最后,\textit{无监督类型重新排序}模块利用上述两个模型的推理结果生成对假阴性样本鲁棒的类型预测。大量实验证明,SSET显著优于现有最先进的 methods。
https://arxiv.org/abs/2404.08313
The integration of Large Language Models (LLMs) and knowledge graphs (KGs) has achieved remarkable success in various natural language processing tasks. However, existing methodologies that integrate LLMs and KGs often navigate the task-solving process solely based on the LLM's analysis of the question, overlooking the rich cognitive potential inherent in the vast knowledge encapsulated in KGs. To address this, we introduce Observation-Driven Agent (ODA), a novel AI agent framework tailored for tasks involving KGs. ODA incorporates KG reasoning abilities via global observation that enhances reasoning capabilities through a cyclical paradigm of observation, action, and reflection. Confronting the exponential explosion of knowledge during observation, we innovatively design a recursive observation mechanism. Subsequently, we integrate the observed knowledge into the action and reflection modules. Through extensive experiments, ODA demonstrates state-of-the-art performance on several datasets, notably achieving accuracy improvements of 12.87% and 8.9%.
大规模语言模型(LLMs)与知识图(KGs)的集成在各种自然语言处理任务中取得了显著的成功。然而,现有的将LLMs与KGs集成的方法往往仅基于LLM对问题的分析来解决问题,而忽略了KGs中蕴含的丰富认知潜力。为解决这个问题,我们引入了观察驱动的智能体(ODA),一种专门针对涉及KGs的任务的AI框架。ODA通过通过全局观察来增强推理能力,采用观察、动作和反思的循环模式来解决观察膨胀的问题。在观察爆炸的过程中,我们创新地设计了一个递归的观察机制。然后,我们将观察到的知识集成到动作和反思模块中。通过大量实验,ODA在多个数据集上表现出与最先进方法相当的表现,尤其是在准确性方面,提高了12.87%和8.9%。
https://arxiv.org/abs/2404.07677
Knowledge graphs are useful tools to organize, recommend and sort data. Hierarchies in knowledge graphs provide significant benefit in improving understanding and compartmentalization of the data within a knowledge graph. This work leverages large language models to generate and augment hierarchies in an existing knowledge graph. For small (<100,000 node) domain-specific KGs, we find that a combination of few-shot prompting with one-shot generation works well, while larger KG may require cyclical generation. We present techniques for augmenting hierarchies, which led to coverage increase by 98% for intents and 99% for colors in our knowledge graph.
知识图谱是有用的一些工具来组织、推荐和排序数据。知识图谱中的层次结构在提高数据在知识图谱中的理解和划分方面具有显著的优势。这项工作利用了大型语言模型生成和增强现有知识图谱中的层次结构。对于小(<100,000个节点)领域特定的KG,我们发现少数shot提示与一次生成相结合效果很好,而较大的KG可能需要循环生成。我们提出了增强层次结构的技术,在知识图中使意图和颜色的覆盖率分别增加了98%和99%。
https://arxiv.org/abs/2404.08020
Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.
近年来,大型语言模型(LLMs)已经在智能代理领域取得了显著的潜力。然而,现有的研究主要关注通过精心设计的问题工程或任务特定微调来增强代理的推理或决策能力,而忽略了探索和利用的过程。当处理开放世界交互环境中的复杂任务时,这些方法表现出局限性。首先,环境的全局信息缺乏导致贪心决策,导致最优解。另一方面,从环境中获得的无关信息不仅带来了噪声,而且还会造成额外的代价。本文提出了一种新颖的方法,即弱探索强利用(WESE),以提高LLM代理在解决开放世界交互任务中的性能。具体来说,WESE包括解耦探索和利用过程,使用一种成本效益的弱代理执行全局知识探索任务。然后引入一个知识图,用于存储获得的知识并提取任务相关的知识,从而增强成功率和效率较强的代理在探索任务中的表现。我们的方法足够灵活,可以涵盖各种任务,并且在四个交互基准测试中都取得了显著的改进。
https://arxiv.org/abs/2404.07456
Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, an inductive reasoning model that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 14 of them.
知识图(KG)中的复杂逻辑查询(CLQA)超越了简单的KG完成,旨在回答由多个投影和逻辑操作组成的复合查询。现有的CLQA方法只能应用于它们所训练的图,这需要在新图上进行大量训练时间才能部署。在这里,我们提出了UltraQuery,一种归纳推理模型,可以在任何KG上零 shots地回答逻辑查询。UltraQuery的核心思想是将投影和逻辑操作作为独立于词汇的函数,扩展到任何KG中的新实体和关系。通过从预训练的归纳KG推理模型中初始化投影操作,UltraQuery可以在仅针对单个数据集微调的情况下解决CLQA。在23个数据集上的实验表明,UltraQuery在零 shot推理模式下具有与最佳现有基线竞争或更好的查询回答性能,其中14个数据集的性能达到了当前最先进的水平。
https://arxiv.org/abs/2404.07198
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid formal definitions of concepts and their operationalization, it can be challenging to establish relevant concept datasets. Here, we address this challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet) for comprehensive concept definition and present a workflow for user-driven data collection in both text and image domains. The concepts derived from knowledge graphs are defined interactively, providing an opportunity for personalization and ensuring that the concepts reflect the user's intentions. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs based on these empirical concept datasets provide robust and accurate explanations. Importantly, we also find good alignment between the models' representations of concepts and the structure of knowledge graphs, i.e., human representations. This supports our conclusion that knowledge graph-based concepts are relevant for XAI.
基于概念的合理解释AI作为提高特定用户基于模型的理解的有前途的工具,例如作为个性化的合理解释工具。一类基于概念的合理解释方法是通过经验定义的概念,通过一系列正面和负面例子间接定义,如TCAV方法(Kim et al., 2018)构建的。虽然用户希望避免概念及其操作的正式定义,但建立相关概念数据集仍然具有挑战性。在这里,我们通过综合知识图(如Wikidata或WordNet)进行全面的 concepts 定义,并呈现了在文本和图像领域中用户驱动数据收集的工作流程。从知识图中获得的 concepts 是交互式定义的,为个性化提供了机会,并确保概念反映了用户的意图。我们在两个概念基于 explainability 方法上测试检索到的概念数据集:概念激活矢量(CAVs)和概念激活区域(CARs)(Crabbe 和 van der Schaar, 2022)。我们证明了基于这些经验概念数据的 CAVs 和 CARs 提供了一种可靠且准确的解释。重要的是,我们还发现模型对概念的表示与知识图的结构之间存在良好的对应关系,即人机表示。这支持了我们关于知识图概念对于 XAI 的结论。
https://arxiv.org/abs/2404.07008
The purpose of emotion-cause pair extraction is to extract the pair of emotion clauses and cause clauses. On the one hand, the existing methods do not take fully into account the relationship between the emotion extraction of two auxiliary tasks. On the other hand, the existing two-stage model has the problem of error propagation. In addition, existing models do not adequately address the emotion and cause-induced locational imbalance of samples. To solve these problems, an end-to-end multitasking model (MM-ECPE) based on shared interaction between GRU, knowledge graph and transformer modules is proposed. Furthermore, based on MM-ECPE, in order to use the encoder layer to better solve the problem of imbalanced distribution of clause distances between clauses and emotion clauses, we propose a novel encoding based on BERT, sentiment lexicon, and position-aware interaction module layer of emotion motif pair retrieval model (MM-ECPE(BERT)). The model first fully models the interaction between different tasks through the multi-level sharing module, and mines the shared information between emotion-cause pair extraction and the emotion extraction and cause extraction. Second, to solve the imbalanced distribution of emotion clauses and cause clauses problem, suitable labels are screened out according to the knowledge graph path length and task-specific features are constructed so that the model can focus on extracting pairs with corresponding emotion-cause relationships. Experimental results on the ECPE benchmark dataset show that the proposed model achieves good performance, especially on position-imbalanced samples.
情感词对提取的目的是提取情感短语和原因短语。一方面,现有的方法没有充分考虑两个自辅助任务之间的情感提取关系。另一方面,现有的两阶段模型存在错误传播问题。此外,现有的模型没有充分解决样本情感和原因诱导的局部不平衡问题。为解决这些问题,我们提出了一个基于GRU、知识图和Transformer模块的端到端多任务模型(MM-ECPE)。 此外,基于MM-ECPE,为了更好地利用编码器层解决词汇表征层之间短语距离的不平衡问题,我们提出了一个基于BERT、情感词汇和位置感知交互模块的情感短语对检索模型(MM-ECPE(BERT))的新编码器层。 模型首先通过多级共享模块全面建模不同任务之间的交互,并挖掘情感词对提取和情感提取及原因提取之间的共享信息。然后,为解决情感短语和原因短语的不平衡分布问题,根据知识图路径长度和任务特定特征筛选出适当的标签,以便模型集中精力提取相应情感词对之间的关系。在ECPE基准数据集的实验结果中,与现有模型相比,所提出的模型在位置不平衡样本上的表现良好。
https://arxiv.org/abs/2404.06812
Sourcing and identification of new manufacturing partners is crucial for manufacturing system integrators to enhance agility and reduce risk through supply chain diversification in the global economy. The advent of advanced large language models has captured significant interest, due to their ability to generate comprehensive and articulate responses across a wide range of knowledge domains. However, the system often falls short in accuracy and completeness when responding to domain-specific inquiries, particularly in areas like manufacturing service discovery. This research explores the potential of leveraging Knowledge Graphs in conjunction with ChatGPT to streamline the process for prospective clients in identifying small manufacturing enterprises. In this study, we propose a method that integrates bottom-up ontology with advanced machine learning models to develop a Manufacturing Service Knowledge Graph from an array of structured and unstructured data sources, including the digital footprints of small-scale manufacturers throughout North America. The Knowledge Graph and the learned graph embedding vectors are leveraged to tackle intricate queries within the digital supply chain network, responding with enhanced reliability and greater interpretability. The approach highlighted is scalable to millions of entities that can be distributed to form a global Manufacturing Service Knowledge Network Graph that can potentially interconnect multiple types of Knowledge Graphs that span industry sectors, geopolitical boundaries, and business domains. The dataset developed for this study, now publicly accessible, encompasses more than 13,000 manufacturers' weblinks, manufacturing services, certifications, and location entity types.
采购和识别新制造商合作伙伴对全球经济中的供应链多元化至关重要,这可以提高制造系统集成商的敏捷性,并通过供应链多元化提高风险降低。先进的大型语言模型的出现引起了广泛关注,因为它们能够生成全面且明确的回答,涵盖广泛的领域知识。然而,当回答领域特定问题时,系统往往存在准确性和完整性不足的情况,特别是在制造业服务发现领域。这项研究探讨了在知识图谱与 ChatGPT 的结合下,简化潜在客户在识别小制造企业过程中的可能性。 在本研究中,我们提出了一种方法,将自下而上的本体与先进机器学习模型相结合,从包括北美地区小型制造商的数字足迹在内的一系列结构和非结构化数据源中开发出制造业服务知识图。知识图和学到的图嵌入向量被用来处理数字供应链网络中的复杂查询,并回应提高可靠性和增强可解释性的答案。 所提出的方法具有可扩展性,可以将数百万实体分配到形成一个全球制造业服务知识网络图,这个网络图可能连接多个跨越行业部门、地理政治边界和企业领域的知识图。 为这项研究创建的数据集,现已成为公开可访问的数据库,包括13,000多个制造商网站、制造业服务、认证和位置实体类型。
https://arxiv.org/abs/2404.06571
Adverse drug reactions considerably impact patient outcomes and healthcare costs in cancer therapy. Using artificial intelligence to predict adverse drug reactions in real time could revolutionize oncology treatment. This study aims to assess the performance of artificial intelligence models in predicting adverse drug reactions in patients with cancer. This is the first systematic review and meta-analysis. Scopus, PubMed, IEEE Xplore, and ACM Digital Library databases were searched for studies in English, French, and Arabic from January 1, 2018, to August 20, 2023. The inclusion criteria were: (1) peer-reviewed research articles; (2) use of artificial intelligence algorithms (machine learning, deep learning, knowledge graphs); (3) study aimed to predict adverse drug reactions (cardiotoxicity, neutropenia, nephrotoxicity, hepatotoxicity); (4) study was on cancer patients. The data were extracted and evaluated by three reviewers for study quality. Of the 332 screened articles, 17 studies (5%) involving 93,248 oncology patients from 17 countries were included in the systematic review, of which ten studies synthesized the meta-analysis. A random-effects model was created to pool the sensitivity, specificity, and AUC of the included studies. The pooled results were 0.82 (95% CI:0.69, 0.9), 0.84 (95% CI:0.75, 0.9), and 0.83 (95% CI:0.77, 0.87) for sensitivity, specificity, and AUC, respectively, of ADR predictive models. Biomarkers proved their effectiveness in predicting ADRs, yet they were adopted by only half of the reviewed studies. The use of AI in cancer treatment shows great potential, with models demonstrating high specificity and sensitivity in predicting ADRs. However, standardized research and multicenter studies are needed to improve the quality of evidence. AI can enhance cancer patient care by bridging the gap between data-driven insights and clinical expertise.
翻译:不良药物反应对癌症治疗的患者结局和医疗费用产生严重影响。利用人工智能在实时预测患者癌症中的不良反应可能彻底颠覆癌症治疗。这项研究旨在评估人工智能模型预测癌症患者中不良反应的性能。这是第一篇系统综述和meta分析。在2018年1月至2023年8月期间,用英语、法语和阿拉伯语从PubMed、IEEE Xplore和ACM Digital Library数据库中搜索研究。纳入标准包括: (1)同行评审的研究文章; (2)应用人工智能算法(机器学习、深度学习、知识图谱); (3)旨在预测不良反应(心血管毒性、中性粒减少、肾毒性、肝脏毒性); (4)研究对象为癌症患者。 数据由三位审稿人评估研究质量。在332篇筛选出的文章中,有17篇(5%)研究(93,248名癌症患者来自17个国家的)纳入系统综述,其中10篇研究进行了元分析。使用随机效应模型对纳入研究的敏感性、特异性、AUC进行了加权平均。加权平均结果分别为: - 敏感性:0.82(95%CI:0.69,0.9); - 特异性:0.84(95%CI:0.75,0.9); - AUC:0.83(95%CI:0.77,0.87)。 生物标志物在预测ADR方面证明了自己的有效性,然而只有半数被回顾的研究采用了这些生物标志物。人工智能在癌症治疗中显示出巨大的潜力,模型在预测ADR方面的特异性和敏感性均很高。然而,需要标准化研究和多中心研究来提高证据的质量。人工智能可以通过缩小数据驱动见解和临床专业知识之间的差距来提高癌症患者的护理。
https://arxiv.org/abs/2404.05762
Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountability, preventing researchers from developing accountable models. Existing KGE-based explanation methods focus on exploring key paths or isolated edges as explanations, which is information-less to reason target prediction. Additionally, the missing ground truth leads to these explanation methods being ineffective in quantitatively evaluating explored explanations. To overcome these limitations, we propose KGExplainer, a model-agnostic method that identifies connected subgraph explanations and distills an evaluator to assess them quantitatively. KGExplainer employs a perturbation-based greedy search algorithm to find key connected subgraphs as explanations within the local structure of target predictions. To evaluate the quality of the explored explanations, KGExplainer distills an evaluator from the target KGE model. By forwarding the explanations to the evaluator, our method can examine the fidelity of them. Extensive experiments on benchmark datasets demonstrate that KGExplainer yields promising improvement and achieves an optimal ratio of 83.3% in human evaluation.
知识图完成(KGC)旨在解决知识图(KG)固有的不完整性,这对于各种应用(如在线推荐)至关重要。尽管基于知识图嵌入(KGE)的模型在KGC任务中表现出了卓越的预测性能,但这些模型以黑盒方式推断缺失链接,缺乏透明度和责任,阻碍了研究人员开发可负责任的模型。现有的KGE基于解释方法集中于探索关键路径或离散的边缘作为解释,这是对目标预测信息不足的推理。此外,缺失的标注真相导致这些解释方法在定量评估探索的解释方面变得无效。为了克服这些限制,我们提出了KGExplainer,一种模型无关的方法,它识别出目标预测的连接子图解释,并将其评估为质量。KGExplainer采用基于扰动的贪心搜索算法在目标预测的局部结构中查找关键连接子图作为解释。为了评估探索的解释的质量,KGExplainer从目标KGE模型中提取评估者。通过将解释向前传递给评估者,我们的方法可以检查它们的可靠性。在基准数据集上进行的大量实验证明,KGExplainer取得了改进,并实现了人类评估的83.3%的最优比例。
https://arxiv.org/abs/2404.03893
In this work, we are interested in automated methods for knowledge graph creation (KGC) from input text. Progress on large language models (LLMs) has prompted a series of recent works applying them to KGC, e.g., via zero/few-shot prompting. Despite successes on small domain-specific datasets, these models face difficulties scaling up to text common in many real-world applications. A principal issue is that in prior methods, the KG schema has to be included in the LLM prompt to generate valid triplets; larger and more complex schema easily exceed the LLMs' context window length. To address this problem, we propose a three-phase framework named Extract-Define-Canonicalize (EDC): open information extraction followed by schema definition and post-hoc canonicalization. EDC is flexible in that it can be applied to settings where a pre-defined target schema is available and when it is not; in the latter case, it constructs a schema automatically and applies self-canonicalization. To further improve performance, we introduce a trained component that retrieves schema elements relevant to the input text; this improves the LLMs' extraction performance in a retrieval-augmented generation-like manner. We demonstrate on three KGC benchmarks that EDC is able to extract high-quality triplets without any parameter tuning and with significantly larger schemas compared to prior works.
在这项工作中,我们感兴趣的是从输入文本中自动生成知识图(KGC)的方法。大型语言模型(LLMs)的进步导致了一系列将它们应用于KGC的最近工作,例如通过零/几帧提示。尽管在小型领域特定数据集上取得了成功,但这些模型在许多现实世界应用中扩展到文本common遇到困难。一个主要问题是,在先前的方法中,KG模式必须在LLM提示中包括才能生成有效的三元组;大型的和更复杂模式很容易超过LLMs的上下文窗口长度。为了解决这个问题,我们提出了一个名为Extract-Define-Canonicalize(EDC)的三阶段框架:开放信息提取 followed by 模式定义和后置正则化。EDC具有灵活性,因为它可以应用于具有预定义目标模式和不需要预定义模式的情况;在后一种情况下,它自动构建模式并应用自正则化。为了进一步提高性能,我们引入了一个训练过的组件,它检索与输入文本相关的模式元素;这使得LLM在检索增强生成方式下的提取性能得到提高。我们在三个KGC基准测试中证明了EDC能够在不进行参数调整的情况下提取高质量的三元组,并且与先前的作品相比,具有明显更大的模式。
https://arxiv.org/abs/2404.03868
Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of common factual knowledge information. However, unravelling the underlying reasoning of LLMs and explaining their internal mechanisms of exploiting this factual knowledge remain active areas of investigation. Our work analyzes the factual knowledge encoded in the latent representation of LLMs when prompted to assess the truthfulness of factual claims. We propose an end-to-end framework that jointly decodes the factual knowledge embedded in the latent space of LLMs from a vector space to a set of ground predicates and represents its evolution across the layers using a temporal knowledge graph. Our framework relies on the technique of activation patching which intervenes in the inference computation of a model by dynamically altering its latent representations. Consequently, we neither rely on external models nor training processes. We showcase our framework with local and global interpretability analyses using two claim verification datasets: FEVER and CLIMATE-FEVER. The local interpretability analysis exposes different latent errors from representation to multi-hop reasoning errors. On the other hand, the global analysis uncovered patterns in the underlying evolution of the model's factual knowledge (e.g., store-and-seek factual information). By enabling graph-based analyses of the latent representations, this work represents a step towards the mechanistic interpretability of LLMs.
大语言模型(LLMs)展示了其在回忆广泛共同事实知识信息方面令人印象深刻的能力。然而,揭示LLMs内部利用这种事实知识的推理原理以及解释它们如何利用事实知识仍然是一个活跃的研究领域。我们的工作分析了LLM潜在表示中编码的事实知识。我们提出了一个端到端的框架,该框架将LLM潜在空间中的事实知识从向量空间解码到一组地面谓词,并使用时间知识图表示其随层进化。我们的框架依赖于激活补丁技术,该技术通过动态改变其潜在表示影响模型的推理计算。因此,我们既不依赖外部模型,也不依赖训练过程。我们使用两个声称验证数据集:FEVER和CLIMATE-FEVER)进行局部和全局可解释性分析。局部可解释性分析揭示了从表示到多级推理的潜在错误。另一方面,全局分析发现了模型事实知识底层演变的模式(例如,存储和查找事实信息)。通过使基于图的LLM潜在表示分析成为可能,这项工作代表了一个向LLM的机械可解释性迈进的步骤。
https://arxiv.org/abs/2404.03623
Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text.
知识图(KGs)在信息处理和推理应用中证明至关重要,因为它们将相关的实体链接起来,并提供丰富上下文信息,支持有效的信息检索和知识发现;以非常有效的方式呈现信息流。尽管在全球范围内得到广泛应用,但孟加拉语在KGs中的代表性相对较低,主要是因为缺乏全面的数据集、编码器、词干提取器(NER)、语义标注器和词干提取器,这阻碍了在孟加拉语中进行高效的信息处理和推理应用。为了解决孟加拉语在KGs中的不足,我们提出了BanglaAutoKG,这是一个先驱性的框架,能够自动构建孟加拉语知识图。我们利用多语言LLM理解各种语言,并普遍地将实体和关系进行关联。通过使用翻译词典识别英语等值,并从预训练的BERT模型中提取词特征,我们构建了基础KG。为了减少噪声并使词向量与我们的目标对齐,我们使用基于图的多项式滤波器。最后,我们实现了基于GNN的语义滤波器,提高了上下文理解并削减了不必要的边,最终形成了确定的KG。实证研究和案例研究证明了我们模型的普遍有效性,能够自主构建语义丰富的KGs。
https://arxiv.org/abs/2404.03528
Artificial Intelligence applications gradually move outside the safe walls of research labs and invade our daily lives. This is also true for Machine Learning methods on Knowledge Graphs, which has led to a steady increase in their application since the beginning of the 21st century. However, in many applications, users require an explanation of the Artificial Intelligences decision. This led to increased demand for Comprehensible Artificial Intelligence. Knowledge Graphs epitomize fertile soil for Comprehensible Artificial Intelligence, due to their ability to display connected data, i.e. knowledge, in a human- as well as machine-readable way. This survey gives a short history to Comprehensible Artificial Intelligence on Knowledge Graphs. Furthermore, we contribute by arguing that the concept Explainable Artificial Intelligence is overloaded and overlapping with Interpretable Machine Learning. By introducing the parent concept Comprehensible Artificial Intelligence, we provide a clear-cut distinction of both concepts while accounting for their similarities. Thus, we provide in this survey a case for Comprehensible Artificial Intelligence on Knowledge Graphs consisting of Interpretable Machine Learning on Knowledge Graphs and Explainable Artificial Intelligence on Knowledge Graphs. This leads to the introduction of a novel taxonomy for Comprehensible Artificial Intelligence on Knowledge Graphs. In addition, a comprehensive overview of the research on Comprehensible Artificial Intelligence on Knowledge Graphs is presented and put into the context of the taxonomy. Finally, research gaps in the field of Comprehensible Artificial Intelligence on Knowledge Graphs are identified for future research.
人工智能应用逐渐从研究实验室的安全边界外移,并入侵到我们的日常生活中。这同样适用于知识图谱上的机器学习方法,从21世纪初开始,它们的应用一直在稳步增长。然而,在许多应用中,用户需要了解人工智能的决策。这导致了对可解释人工智能的需求增加。知识图谱成为展示连接数据(即知识)在人类和机器可读方式下的有利土壤,因为它们具有将知识以人类和机器可读方式展示的能力。 本调查给可解释人工智能在知识图谱上的历史提供了简短回顾。此外,我们通过论证可解释人工智能与可解释机器学习的概念超载和重叠,引入了父概念可解释人工智能。这使得我们能够在调查中明确区分这两个概念,同时考虑它们的相似之处。因此,我们在调查中为知识图谱上的可解释人工智能提出了一个案例,包括知识图谱上的可解释机器学习和可解释人工智能。这导致了一个新颖的分类器,用于知识图谱上的可解释人工智能。 此外,对知识图谱上可解释人工智能的研究全面回顾以及它们在分类器中的位置进行了呈现。最后,为未来研究在知识图谱上可解释人工智能领域确定了研究空白。
https://arxiv.org/abs/2404.03499
With the rise of computational social science, many scholars utilize data analysis and natural language processing tools to analyze social media, news articles, and other accessible data sources for examining political and social discourse. Particularly, the study of the emergence of echo-chambers due to the dissemination of specific information has become a topic of interest in mixed methods research areas. In this paper, we analyze data collected from two news portals, Breitbart News (BN) and New York Times (NYT) to prove the hypothesis that the formation of echo-chambers can be partially explained on the level of an individual information consumption rather than a collective topology of individuals' social networks. Our research findings are presented through knowledge graphs, utilizing a dataset spanning 11.5 years gathered from BN and NYT media portals. We demonstrate that the application of knowledge representation techniques to the aforementioned news streams highlights, contrary to common assumptions, shows relative "internal" neutrality of both sources and polarizing attitude towards a small fraction of entities. Additionally, we argue that such characteristics in information sources lead to fundamental disparities in audience worldviews, potentially acting as a catalyst for the formation of echo-chambers.
在计算社会科学崛起的推动下,许多学者利用数据分析和自然语言处理工具对社交媒体、新闻文章和其他可获取数据源进行分析,以研究政治和社会 discourse。特别是,由于特定信息传播而产生的回音腔现象已经成为混合方法研究领域的一个研究课题。在本文中,我们分析来自两个新闻门户网站(Breitbart News和纽约时报)的数据,以证明回音腔形成的部分解释可以归因于个人信息消费水平,而不是个人社交网络的集合拓扑结构。我们的研究结果通过知识图进行呈现,利用了一个从BN和NYT媒体门户收集的数据集跨越11.5年的数据。我们表明,将知识表示技术应用于上述新闻流中,与通常的假设相反,突出了两个来源的相对“内部”中立性,以及对于小部分实体的极化态度。此外,我们认为这种信息来源的特点导致受众世界观的根本差异,可能成为回音腔形成的催化剂。
https://arxiv.org/abs/2404.03437
Recommender systems (RSs) are designed to provide personalized recommendations to users. Recently, knowledge graphs (KGs) have been widely introduced in RSs to improve recommendation accuracy. In this study, however, we demonstrate that RSs do not necessarily perform worse even if the KG is downgraded to the user-item interaction graph only (or removed). We propose an evaluation framework KG4RecEval to systematically evaluate how much a KG contributes to the recommendation accuracy of a KG-based RS, using our defined metric KGER (KG utilization efficiency in recommendation). We consider the scenarios where knowledge in a KG gets completely removed, randomly distorted and decreased, and also where recommendations are for cold-start users. Our extensive experiments on four commonly used datasets and a number of state-of-the-art KG-based RSs reveal that: to remove, randomly distort or decrease knowledge does not necessarily decrease recommendation accuracy, even for cold-start users. These findings inspire us to rethink how to better utilize knowledge from existing KGs, whereby we discuss and provide insights into what characteristics of datasets and KG-based RSs may help improve KG utilization efficiency.
推荐系统(RS)旨在为用户提供个性化的推荐。最近,知识图(KG)在RS中被广泛引入,以提高推荐准确性。然而,在本文中,我们证明了即使将KG降级至用户-物品交互图(或删除),RS也不一定表现得更差。为了系统地评估KG对基于KG的RS推荐准确性的贡献,我们提出了一个名为KG4RecEval的评估框架,使用我们定义的指标KGER(KG利用率效率)。我们考虑了KG中知识完全消除、随机扭曲和下降以及冷启动用户的情况。我们对四个常用的数据集及其许多最先进的基于KG的RS进行了广泛的实验,结果表明:消除、随机扭曲或降低KG中的知识不一定会降低推荐准确性,甚至对于冷启动用户。这些发现激发了我们重新思考如何更好地利用现有KG中的知识,因此我们讨论并提供了关于数据集和基于KG的RS可能如何提高KG利用率效率的见解。
https://arxiv.org/abs/2404.03164
The convergence of materials science and artificial intelligence has unlocked new opportunities for gathering, analyzing, and generating novel materials sourced from extensive scientific literature. Despite the potential benefits, persistent challenges such as manual annotation, precise extraction, and traceability issues remain. Large language models have emerged as promising solutions to address these obstacles. This paper introduces Functional Materials Knowledge Graph (FMKG), a multidisciplinary materials science knowledge graph. Through the utilization of advanced natural language processing techniques, extracting millions of entities to form triples from a corpus comprising all high-quality research papers published in the last decade. It organizes unstructured information into nine distinct labels, covering Name, Formula, Acronym, Structure/Phase, Properties, Descriptor, Synthesis, Characterization Method, Application, and Domain, seamlessly integrating papers' Digital Object Identifiers. As the latest structured database for functional materials, FMKG acts as a powerful catalyst for expediting the development of functional materials and a fundation for building a more comprehensive material knowledge graph using full paper text. Furthermore, our research lays the groundwork for practical text-mining-based knowledge management systems, not only in intricate materials systems but also applicable to other specialized domains.
材料科学和人工智能的汇聚为收集、分析和生成来源于广泛科学文献的新材料提供了新的机会。尽管带来了潜在的好处,但持续的挑战,如手动注释、精确提取和可追溯性问题仍然存在。大型语言模型已成为解决这些障碍的有前景的解决方案。本文介绍了功能材料知识图(FMKG),一种跨学科的材料科学知识图。通过利用先进的自然语言处理技术,从包含过去十年发表的所有高质量研究论文的语料库中提取数百万个实体,形成三元组。它将无结构信息划分为九个不同的标签,覆盖名称、化学式、缩写、结构/相、性质、描述符、合成、表征方法、应用和领域,无缝集成论文的数字对象标识符。作为功能材料的最新结构数据库,FMKG在加速功能材料的开发和构建更全面的材料知识图中发挥着强大的促进作用。此外,我们的研究为实际基于文本挖掘的知识管理系统奠定了基础,不仅适用于复杂的材料系统,而且适用于其他专业领域。
https://arxiv.org/abs/2404.03080
Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation. With the advent of large language models (LLMs), there has been a shift towards linearization-based methods, which process structured data as sequential token streams, diverging from approaches that explicitly model structure, often as a graph. Crucially, there remains a gap in our understanding of how these linearization-based methods handle structured data, which is inherently non-linear. This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction, indicating a deep, meaningful learning of structure beyond simple token sequencing. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings and the potential for model compression due to modality fusion redundancy. Overall, this work sheds light on the inner workings of linearization-based methods and could potentially provide guidance for future research.
结构化数据,在表格、数据库和知识图中普遍存在,在表示上带来了显著的挑战。随着大型语言模型的出现(LLMs),线性化方法逐渐成为主流,这些方法将结构化数据处理为序列标记流,从明确建模结构的常见方法(通常是图)中进行了转移。关键是,我们对这些线性化方法如何处理结构化数据的理解仍然存在一定的差距,而这种非线性结构本质上是不可预测的。本文研究了编码器-解码器语言模型(如T5)对结构化数据的线性处理,我们的发现表明,该模型能够模拟人类设计的进程,如模式链接和语法预测,表明在简单的标记序列之外,结构之间有更深的、更富有意义的学习。我们还揭示了模型内部机制的一些洞察,包括结构节点编码的自中心性质和模态融合冗余所带来的模型压缩潜力。总的来说,本文使我们对线性化方法的内部工作原理有了更深入的了解,并为未来的研究提供了指导。
https://arxiv.org/abs/2404.02389
Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs involves matching them structurally to each other and semantically to the source text. Existing AMR metrics are inefficient and struggle to capture semantic similarity. We also lack a systematic evaluation benchmark for assessing structural similarity between AMR graphs. To overcome these limitations, we introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE. Among state-of-the-art metrics, rematch ranks second in structural similarity; and first in semantic similarity by 1--5 percentage points on the STS-B and SICK-R benchmarks. Rematch is also five times faster than the next most efficient metric.
知识图在各种应用中扮演着关键角色,如问答和事实核查。抽象意义表示(AMR)将文本表示为知识图。评估这些图的质量涉及将它们结构上对齐,语义上与源文本对齐。现有的AMR指标效率低下,且很难捕捉语义相似性。此外,我们还没有一个系统性的评估基准来评估AMR图之间的结构相似性。为了克服这些限制,我们引入了一个新的AMR相似度指标——rematch,并引入了一个新的评估结构相似性的基准——RARE。在最先进的指标中,rematch在结构相似性上排名第二;而在STS-B和SICK-R基准上,它在语义相似性上比第二高效的指标高1-5个百分点。rematch也比下一个最有效的指标快五倍。
https://arxiv.org/abs/2404.02126