Abstract
Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies have narrowly focused on negation detection, leading to underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches. We evaluated these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated-with-Someone-Else (+10.1%) categories, while the few-shot classifier offers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over general-purpose LLMs and proprietary APIs.
Abstract (translated)
断言状态检测是临床自然语言处理(NLP)的一个关键但常被忽视的组成部分,对于准确地归因于提取的医学事实至关重要。过去的研究主要集中在否定检测上,导致像AWS Medical Comprehend、Azure AI Text Analytics和GPT-4o这样的商业解决方案表现不佳,主要是因为它们在特定领域的适应性有限。为了解决这一缺口,我们开发了最先进的断言检测模型,包括微调的大型语言模型(LLM)、基于变压器的分类器、少量样本分类器以及深度学习(DL)方法。我们在这些模型与云端商用API解决方案、传统的规则基础NegEx方法和GPT-4o之间进行了对比评估。 我们的微调LLM取得了最高的整体准确率(0.962),显著优于GPT-4o(0.901)以及商业API,尤其是在当前状态(+4.2%)、不存在(+8.4%)和假设(+23.4%)断言方面。我们的基于DL的模型在条件性(+5.3%)和与他人有关(+10.1%)类别中超越了商用解决方案,而少量样本分类器则提供了一个轻量级但极具竞争力的选择(0.929),非常适合资源受限环境。 当集成到Spark NLP时,我们的模型能够持续优于黑盒商业解决方案,并且支持大规模推理以及与医学命名实体识别、关系抽取和术语解析的无缝整合。这些结果强化了领域适应性、透明性和可定制化的临床NLP解决方案的重要性,相对于通用目的LLM和专有API而言更为重要。
URL
https://arxiv.org/abs/2503.17425