Paper Reading AI Learner

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

2024-09-06 10:49:46
Larissa Pusch, Tim O. F. Conrad

Abstract

Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: this https URL

Abstract (translated)

自然语言处理(NLP)的进步已经彻底颠覆了我们与数字信息系统(如数据库)互动的方式,使这些系统更加易于使用。然而,尤其是当准确性至关重要时,挑战仍然存在,尤其是在生物医学领域。一个关键问题是幻觉问题,即模型生成的信息与底层数据不支持,可能导致危险的错误信息。本文提出了一种通过结合大型语言模型(LLM)和知识图(KG)来弥合这一差距的新方法,以提高问答系统的准确性和可靠性,以生物医学领域的知识图为例。该方法基于LangChain框架实现,并包括一个查询检查器,用于确保LLM生成的查询的语义和语法正确性,然后用于从知识图中提取信息,大大减少了类似于幻觉的错误。我们对50个生物医学问题的新基准数据集进行了评估,测试了包括GPT-4 Turbo和llama3:70b在内的几种LLM,结果表明,虽然GPT-4 Turbo在生成准确查询方面表现出色,但开源模型如llama3:70b表现出巨大的潜力,通过适当的提示工程。为了使这种方法易于使用,开发了一个用户友好的网页界面,使用户能够输入自然语言查询,查看生成的经纠正的Cypher查询,并验证查询结果的准确性。总的来说,这种混合方法有效地解决了常见的问题,如数据缺口和幻觉,为问答系统提供了可靠和直观的解决方案。本文生成结果的源代码和用户界面可以在我们的Git存储库中找到:https:// this URL。

URL

https://arxiv.org/abs/2409.04181

PDF

https://arxiv.org/pdf/2409.04181.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot