Abstract
The text of clinical notes can be a valuable source of patient information and clinical assessments. Historically, the primary approach for exploiting clinical notes has been information extraction: linking spans of text to concepts in a detailed domain ontology. However, recent work has demonstrated the potential of supervised machine learning to extract document-level codes directly from the raw text of clinical notes. We propose to bridge the gap between the two approaches with two novel syntheses: (1) treating extracted concepts as features, which are used to supplement or replace the text of the note; (2) treating extracted concepts as labels, which are used to learn a better representation of the text. Unfortunately, the resulting concepts do not yield performance gains on the document-level clinical coding task. We explore possible explanations and future research directions.
Abstract (translated)
临床记录的文本可以作为患者信息和临床评估的宝贵来源。历史上,利用临床笔记的主要方法是信息提取:将文本的跨度链接到详细的领域本体中的概念。然而,最近的研究已经证明了监督机器学习直接从临床笔记的原始文本中提取文档级代码的潜力。我们建议用两种新颖的综合方法来弥补这两种方法之间的差距:(1)将提取的概念视为特征,用于补充或替换注释的文本;(2)将提取的概念视为标签,用于学习更好的文本表示。不幸的是,由此产生的概念并不能在文档级临床编码任务中获得性能提升。我们探索可能的解释和未来的研究方向。
URL
https://arxiv.org/abs/1906.03380