Abstract
Knowledge graphs (KGs) are large datasets with specific structures representing large knowledge bases (KB) where each node represents a key entity and relations amongst them are typed edges. Natural language queries formed to extract information from a KB entail starting from specific nodes and reasoning over multiple edges of the corresponding KG to arrive at the correct set of answer nodes. Traditional approaches of question answering on KG are based on (a) semantic parsing (SP), where a logical form (e.g., S-expression, SPARQL query, etc.) is generated using node and edge embeddings and then reasoning over these representations or tuning language models to generate the final answer directly, or (b) information-retrieval based that works by extracting entities and relations sequentially. In this work, we evaluate the capability of (LLMs) to answer questions over KG that involve multiple hops. We show that depending upon the size and nature of the KG we need different approaches to extract and feed the relevant information to an LLM since every LLM comes with a fixed context window. We evaluate our approach on six KGs with and without the availability of example-specific sub-graphs and show that both the IR and SP-based methods can be adopted by LLMs resulting in an extremely competitive performance.
Abstract (translated)
知识图(KGs)是具有特定结构的大型数据集,代表大量知识库(KB),其中每个节点表示一个关键实体,它们之间的关系是类型化的边。自然语言查询从特定的节点开始,通过推理与相应知识库中的多个边之间的关系,到达正确的答案节点。传统基于KG的问答方法是基于(a)语义解析(SP),其中使用节点和边嵌入生成逻辑形式(例如S-表达式、SPARQL查询等),然后在这些表示或调整语言模型的基础上进行推理,或(b)信息检索,该方法通过按顺序提取实体和关系来工作。 在这项工作中,我们评估了(LLMs)在回答涉及多个级的KG问题的能力。我们证明了,根据KG的大小和性质,我们需要不同的方法来提取和向LLM提供相关信息,因为每个LLM都具有固定的上下文窗口。我们在六个具有和没有例子特定子图的KG上评估我们的方法,并发现基于IR和SP的方法都可以被LLM采用,导致具有非常竞争力的性能。
URL
https://arxiv.org/abs/2404.19234