Paper Reading AI Learner

Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP

2025-03-21 10:18:47
Veysel Kocaman, Yigit Gul, M. Aytug Kaya, Hasham Ul Haq, Mehmet Butgul, Cabir Celik, David Talby

Abstract

Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies have narrowly focused on negation detection, leading to underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches. We evaluated these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated-with-Someone-Else (+10.1%) categories, while the few-shot classifier offers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over general-purpose LLMs and proprietary APIs.

Abstract (translated)

断言状态检测是临床自然语言处理(NLP)的一个关键但常被忽视的组成部分,对于准确地归因于提取的医学事实至关重要。过去的研究主要集中在否定检测上,导致像AWS Medical Comprehend、Azure AI Text Analytics和GPT-4o这样的商业解决方案表现不佳,主要是因为它们在特定领域的适应性有限。为了解决这一缺口,我们开发了最先进的断言检测模型,包括微调的大型语言模型(LLM)、基于变压器的分类器、少量样本分类器以及深度学习(DL)方法。我们在这些模型与云端商用API解决方案、传统的规则基础NegEx方法和GPT-4o之间进行了对比评估。 我们的微调LLM取得了最高的整体准确率(0.962),显著优于GPT-4o(0.901)以及商业API,尤其是在当前状态(+4.2%)、不存在(+8.4%)和假设(+23.4%)断言方面。我们的基于DL的模型在条件性(+5.3%)和与他人有关(+10.1%)类别中超越了商用解决方案,而少量样本分类器则提供了一个轻量级但极具竞争力的选择(0.929),非常适合资源受限环境。 当集成到Spark NLP时,我们的模型能够持续优于黑盒商业解决方案,并且支持大规模推理以及与医学命名实体识别、关系抽取和术语解析的无缝整合。这些结果强化了领域适应性、透明性和可定制化的临床NLP解决方案的重要性,相对于通用目的LLM和专有API而言更为重要。

URL

https://arxiv.org/abs/2503.17425

PDF

https://arxiv.org/pdf/2503.17425.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot