Abstract
Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States.
Abstract (translated)
临床试验匹配的任务是确定可能符合条件的患者。通常,这项任务费力且需要对患者的电子病历(EHR)与临床试验的严格纳入和排除标准进行详细验证。这个过程是手动、时间密集且难以扩展的,导致许多患者错过了潜在的治疗选择。近年来,大型语言模型(LLMs)的进步使得自动化患者-试验匹配成为可能,正如多个同时研究论文所展示的那样。然而,现有方法仅限于受限的、通常是由合成数据集,这些数据集并不能充分反映现实医学数据的复杂性。在本研究中,我们首次完成了针对现实世界EHR的大型规模实证评估,对临床试验匹配。我们使用专有的LLM进行了实验,包括GPT-4和GPT-3.5,以及我们自定义的微调模型OncoLLM,并证明了OncoLLM在显著较小的规模下不仅超过了GPT-3.5,而且其性能甚至超过了合格的医生。所有实验都是在包括美国单个癌症中心在内的现实世界EHR上进行的。
URL
https://arxiv.org/abs/2404.15549