Paper Reading AI Learner

PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

2024-04-23 22:33:19
Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh

Abstract

Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States.

Abstract (translated)

临床试验匹配的任务是确定可能符合条件的患者。通常,这项任务费力且需要对患者的电子病历(EHR)与临床试验的严格纳入和排除标准进行详细验证。这个过程是手动、时间密集且难以扩展的,导致许多患者错过了潜在的治疗选择。近年来,大型语言模型(LLMs)的进步使得自动化患者-试验匹配成为可能,正如多个同时研究论文所展示的那样。然而,现有方法仅限于受限的、通常是由合成数据集,这些数据集并不能充分反映现实医学数据的复杂性。在本研究中,我们首次完成了针对现实世界EHR的大型规模实证评估,对临床试验匹配。我们使用专有的LLM进行了实验,包括GPT-4和GPT-3.5,以及我们自定义的微调模型OncoLLM,并证明了OncoLLM在显著较小的规模下不仅超过了GPT-3.5,而且其性能甚至超过了合格的医生。所有实验都是在包括美国单个癌症中心在内的现实世界EHR上进行的。

URL

https://arxiv.org/abs/2404.15549

PDF

https://arxiv.org/pdf/2404.15549.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot