Abstract
This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.
Abstract (translated)
本文主要研究在有限标注数据和狭义知识域背景下的传统关系提取任务。我们用一个由200个乳腺癌随访治疗字母组成的临床语料库来探索这个任务,其中16种不同类型的关系被注释。我们尝试了一种提取类型关系的方法,称为窗口有界共现(wbc),它使用一个可调整的上下文窗口来围绕一个相关类型的实体,并将其性能与一个更典型的句子内共现基线进行比较。我们进一步介绍了一种新的概念包(BOC)方法,该方法基于最先进的单词嵌入和单词同义词。通过与高复杂度方法的比较,论证了中国银行的竞争力,并探讨了其在小数据集上的有效性。
URL
https://arxiv.org/abs/1904.10743