Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

Abstract
Abstract (translated)
URL
PDF

Abstract

Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific evaluations, these LLMs often suffer from hallucination problems due to insufficient training of relevant corpus. Furthermore, fine-tuning large models may face problems such as the LLMs are not open source or the construction of high-quality domain instruction is difficult. Therefore, structured knowledge databases such as knowledge graph can better provide domain back- ground knowledge for LLMs and make full use of the reasoning and analysis capabilities of LLMs. In some previous works, LLM was called multiple times to determine whether the current triplet was suitable for inclusion in the subgraph when retrieving subgraphs through a question. Especially for the question that require a multi-hop reasoning path, frequent calls to LLM will consume a lot of computing power. Moreover, when choosing the reasoning path, LLM will be called once for each step, and if one of the steps is selected incorrectly, it will lead to the accumulation of errors in the following steps. In this paper, we integrated and optimized a pipeline for selecting reasoning paths from KG based on LLM, which can reduce the dependency on LLM. In addition, we propose a simple and effective subgraph retrieval method based on chain of thought (CoT) and page rank which can returns the paths most likely to contain the answer. We conduct experiments on three datasets: GenMedGPT-5k [14], WebQuestions [2], and CMCQA [21]. Finally, RoK can demonstrate that using fewer LLM calls can achieve the same results as previous SOTAs models.

Abstract (translated)

大语言模型（LLMs），如GPT3.5、GPT4和LLAMA2在许多任务上表现出色，甚至超过了人类专家。然而，在许多领域特定评估中，由于相关语料库训练不足，这些LLMs经常受到虚构问题（halo effect）的困扰。此外，对于需要多级推理路径的问题，频繁调用LLM会消耗大量计算资源。在选择推理路径时，LLM将被调用一次每个步骤，如果步骤选择错误，将会导致后续步骤中错误积累。在本文中，我们基于LLM优化和选择了推理路径的管道，从而减少了LLM的依赖。此外，我们提出了一个基于连锁思维（CoT）和页面排名的简单而有效的子图检索方法，可以返回最可能包含答案的路径。我们对三个数据集：GenMedGPT-5k [14]、WebQuestions [2]和CMCQA [21]进行了实验。最后，RoK证明了使用更少的LLM调用可以实现与之前的最优SOTA模型相同的结果。

URL

https://arxiv.org/abs/2404.10384

PDF

https://arxiv.org/pdf/2404.10384.pdf

Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

Abstract

Abstract (translated)

URL

PDF Copy

PDF