Abstract
Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world's languages, it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly. Learning character representations for multiple related languages allows transfer among the languages, improving F1 by up to 9.8 points over a loglinear CRF baseline.
Abstract (translated)
低资源命名实体识别仍然是自然语言处理(NLP)中的一个开放问题。大多数最先进的系统需要数万条带有注释的句子才能获得高性能。然而,对于大多数世界语言,获得这样的注释是不现实的。在本文中,我们提出了一个迁移学习方案,我们通过训练带有上下文关系的字符级神经递归模型(CRF)来共同预测高资源语言和低资源语言中的命名实体。为多个相关语言学习字符表示允许在语言之间进行转移,从而提高F1值,与线性CRF基线相比,F1值提高了9.8个单位。
URL
https://arxiv.org/abs/2404.09383