Abstract
Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign a hierarchical entity type schema including 11 coarse-grained types and 119 fine-grained types, and then re-annotate DocRED manually according to this schema. Through comprehensive experiments we find that: (1) DocRED-FE is challenging to existing JERE models; (2) Our fine-grained entity types promote relation classification. We make DocRED-FE with instruction and the code for our baselines publicly available at this https URL.
Abstract (translated)
联合实体和关系提取(JERE)是信息提取中的最重要任务之一。然而,大多数现有工作集中在句子级别的粗粒度JERE,在实际应用中存在一些限制。在本文中,我们建立了一个大规模的文档级别的精细粒度JERE数据集 DocRED-FE,以改进基于精细实体类型的DocRED。具体来说,我们重新设计了包括11个粗粒度类型和119个精细粒度类型的层级实体类型 schema,然后根据这个 schema 手动重新注释 DocRED。通过全面实验,我们发现:(1) DocRED-FE对现有的JERE模型具有挑战性;(2)我们的精细实体类型促进了关系分类。我们将 DocRED-FE与指令和我们的基准代码在此https URL上公开发布。
URL
https://arxiv.org/abs/2303.11141