Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

Abstract
Abstract (translated)
URL
PDF

Abstract

Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.

Abstract (translated)

目标：临床试验对于推动制药干预至关重要，但在选择合适参与者方面存在瓶颈。尽管利用电子病历（EHR）进行招募的做法已经受到欢迎，但非结构化医疗文本复杂的 nature 提出了有效地识别参与者的挑战。自然语言处理（NLP）技术在最近关注于Transformer模型方面成为了解决方案。在这项研究中，我们旨在评估基于提示的大型语言模型在从EHR中收集的非结构化医疗文本的队列选择任务中的性能。方法：为了处理医学记录，我们选择了与需要试验资格标准相关的最相关的句子。收集了与每个资格标准相关的SNOMED CT概念。同时，根据SNOMED CT语义数据库对医学记录进行了注释。包括与标准匹配的概念的注解句子被提取出来。然后，使用基于提示的大型语言模型（本研究中使用的是Generative Pre-trained Transformer（GPT））对提取的句子进行训练。为了评估其效果，我们使用2018 n2c2挑战的数据集来评估模型的性能，该数据集旨在根据13个资格标准对311名患者的医疗记录进行分类。结果：与该数据集上进行的实验相比，我们提出的模型在整体微和宏观F分数方面得分最高，为0.9061和0.8060，这是该数据集中实现的最高分数。结论：将提示式大型语言模型应用于根据资格标准对患者进行分类，在本研究中得到了有前景的分数。此外，我们还提出了使用SNOMED CT语义数据库的提取式总结方法，该方法也可以应用于其他医学文本。

URL

https://arxiv.org/abs/2404.16198

PDF

https://arxiv.org/pdf/2404.16198.pdf

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

Abstract

Abstract (translated)

URL

PDF Copy

PDF