Paper Reading AI Learner

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

2024-04-27 13:11:42
Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang

Abstract

Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new \textit{Distill-Retrieve-Read} framework instead of the previous \textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.

Abstract (translated)

大规模语言模型(LLMs)在各种语言任务中取得了显著的成功,但存在幻觉和时间错位等缺陷。为了减轻这些不足,检索增强生成(RAG)已被用于提供外部知识以促进答案生成。然而,将这种模型应用于医疗领域面临着多项挑战,因为缺乏领域特定知识和真实世界场景的复杂性。在这项研究中,我们探讨了具有RAG框架的大规模语言模型在医疗领域的知识密集型任务中的应用。为了评估LLMs的性能,我们引入了MedicineQA,一个多轮对话基准,模拟了真实世界药物咨询场景,并要求LLMs根据从药品数据库中检索到的证据回答问题。MedicineQA包含300个多轮问题-答案对,每个都嵌入在一个详细的对话历史中,突出了这种知识密集型任务对现有LLMs所提出的挑战。我们进一步提出了新的\textit{Distill-Retrieve-Read}框架,代替了前面的\textit{Retrieve-then-Read}框架。具体来说,差分和检索过程利用调用机制形成搜索查询,模拟搜索引擎使用的关键词基础查询。通过实验结果,我们证明了我们的框架在证据检索过程中带来了显著的性能提升,并且在证据检索准确性方面超越了前人。这一进步阐明了将RAG应用于医疗领域的重要性。

URL

https://arxiv.org/abs/2404.17897

PDF

https://arxiv.org/pdf/2404.17897.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot