Abstract
In recent years, research has mainly focused on the general NER task. There still have some challenges with nested NER task in the specific domains. Specifically, the scenarios of low resource and class imbalance impede the wide application for biomedical and industrial domains. In this study, we design a novel loss EIoU-EMC, by enhancing the implement of Intersection over Union loss and Multiclass loss. Our proposed method specially leverages the information of entity boundary and entity classification, thereby enhancing the model's capacity to learn from a limited number of data samples. To validate the performance of this innovative method in enhancing NER task, we conducted experiments on three distinct biomedical NER datasets and one dataset constructed by ourselves from industrial complex equipment maintenance documents. Comparing to strong baselines, our method demonstrates the competitive performance across all datasets. During the experimental analysis, our proposed method exhibits significant advancements in entity boundary recognition and entity classification. Our code are available here.
Abstract (translated)
近年来,研究主要集中在通用命名实体识别(NER)任务上。但在特定领域中,嵌套NER任务仍然面临一些挑战,特别是在资源匮乏和类别不平衡的情况下,这对生物医学和工业领域的广泛应用构成了障碍。为此,在这项研究中,我们设计了一种新颖的损失函数EIoU-EMC,通过增强交并比(Intersection over Union, IoU)损失和多类损失的实施来实现这一目标。我们的方法特别利用了实体边界信息和实体分类信息,从而增强了模型从有限数据样本中学习的能力。 为了验证该创新方法在提升NER任务性能方面的效果,我们针对三个不同的生物医学NER数据集以及一个由我们自己构建自工业复杂设备维护文档的数据集进行了实验。与强基线相比,我们的方法在这所有数据集中都显示出了竞争力的性能。在实验分析过程中,所提出的方法在实体边界识别和实体分类上表现出显著的进步。 我们的代码可在此处获取(此处应提供具体链接或访问方式)。
URL
https://arxiv.org/abs/2504.14203