Task formulation for Extracting Social Determinants of Health from Clinical Narratives

Abstract
Abstract (translated)
URL
PDF

Abstract

Objective: The 2022 n2c2 NLP Challenge posed identification of social determinants of health (SDOH) in clinical narratives. We present three systems that we developed for the Challenge and discuss the distinctive task formulation used in each of the three systems. Materials and Methods: The first system identifies target pieces of information independently using machine learning classifiers. The second system uses a large language model (LLM) to extract complete structured outputs per document. The third system extracts candidate phrases using machine learning and identifies target relations with hand-crafted rules. Results: The three systems achieved F1 scores of 0.884, 0.831, and 0.663 in the Subtask A of the Challenge, which are ranked third, seventh, and eighth among the 15 participating teams. The review of the extraction results from our systems reveals characteristics of each approach and those of the SODH extraction task. Discussion: Phrases and relations annotated in the task is unique and diverse, not conforming to the conventional event extraction task. These annotations are difficult to model with limited training data. The system that extracts information independently, ignoring the annotated relations, achieves the highest F1 score. Meanwhile, LLM with its versatile capability achieves the high F1 score, while respecting the annotated relations. The rule-based system tackling relation extraction obtains the low F1 score, while it is the most explainable approach. Conclusion: The F1 scores of the three systems vary in this challenge setting, but each approach has advantages and disadvantages in a practical application. The selection of the approach depends not only on the F1 score but also on the requirements in the application.

Abstract (translated)

目标:2022年n2c2NLP挑战要求在临床日志中识别社交健康决定因素(SDOH)。我们介绍了为挑战开发的三个系统,并讨论了每个系统中使用的独特的任务 formulation。材料和方法:第一个系统使用机器学习分类器独立地识别目标信息。第二个系统使用大型语言模型(LLM)以每文档提取完整的结构化输出。第三个系统使用机器学习提取候选人短语,并使用手工制定的规则识别目标关系。结果:三个系统在挑战任务A的评分为0.884、0.831和0.663,在15个参与团队中排名第三、第七和第八。我们对系统提取结果的分析表明每个方法的特点以及SDOH提取任务的特点。讨论:任务中注释的短语和关系是独特的、多样化的,不符合传统的事件提取任务。这些注释在训练数据有限的情况下难以建模。独立提取信息的系统、忽略注释关系的系统获得最高的F1得分。同时,LLM因其多功能性获得了高F1得分,同时尊重注释关系。基于规则的系统处理关系提取获得的较低的F1得分,但它是解释性最好的方法。结论:在挑战设置中,三个系统的F1得分有所不同,但每种方法在实际应用中都有其优点和缺点。选择方法不仅取决于F1得分,还取决于应用程序的要求。

URL

https://arxiv.org/abs/2301.11386

PDF

https://arxiv.org/pdf/2301.11386.pdf