Abstract
Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: this https URL
Abstract (translated)
自然语言处理(NLP)的进步已经彻底颠覆了我们与数字信息系统(如数据库)互动的方式,使这些系统更加易于使用。然而,尤其是当准确性至关重要时,挑战仍然存在,尤其是在生物医学领域。一个关键问题是幻觉问题,即模型生成的信息与底层数据不支持,可能导致危险的错误信息。本文提出了一种通过结合大型语言模型(LLM)和知识图(KG)来弥合这一差距的新方法,以提高问答系统的准确性和可靠性,以生物医学领域的知识图为例。该方法基于LangChain框架实现,并包括一个查询检查器,用于确保LLM生成的查询的语义和语法正确性,然后用于从知识图中提取信息,大大减少了类似于幻觉的错误。我们对50个生物医学问题的新基准数据集进行了评估,测试了包括GPT-4 Turbo和llama3:70b在内的几种LLM,结果表明,虽然GPT-4 Turbo在生成准确查询方面表现出色,但开源模型如llama3:70b表现出巨大的潜力,通过适当的提示工程。为了使这种方法易于使用,开发了一个用户友好的网页界面,使用户能够输入自然语言查询,查看生成的经纠正的Cypher查询,并验证查询结果的准确性。总的来说,这种混合方法有效地解决了常见的问题,如数据缺口和幻觉,为问答系统提供了可靠和直观的解决方案。本文生成结果的源代码和用户界面可以在我们的Git存储库中找到:https:// this URL。
URL
https://arxiv.org/abs/2409.04181