Abstract
Document-level relation extraction (DocRE) is the process of identifying and extracting relations between entities that span multiple sentences within a document. Due to its realistic settings, DocRE has garnered increasing research attention in recent years. Previous research has mostly focused on developing sophisticated encoding models to better capture the intricate patterns between entity pairs. While these advancements are undoubtedly crucial, an even more foundational challenge lies in the data itself. The complexity inherent in DocRE makes the labeling process prone to errors, compounded by the extreme sparsity of positive relation samples, which is driven by both the limited availability of positive instances and the broad diversity of positive relation types. These factors can lead to biased optimization processes, further complicating the task of accurate relation extraction. Recognizing these challenges, we have developed a robust framework called \textit{\textbf{COMM}} to better solve DocRE. \textit{\textbf{COMM}} operates by initially employing an instance-aware reasoning method to dynamically capture pertinent information of entity pairs within the document and extract relational features. Following this, \textit{\textbf{COMM}} takes into account the distribution of relations and the difficulty of samples to dynamically adjust the margins between prediction logits and the decision threshold, a process we call Concentrated Margin Maximization. In this way, \textit{\textbf{COMM}} not only enhances the extraction of relevant relational features but also boosts DocRE performance by addressing the specific challenges posed by the data. Extensive experiments and analysis demonstrate the versatility and effectiveness of \textit{\textbf{COMM}}, especially its robustness when trained on low-quality data (achieves \textgreater 10\% performance gains).
Abstract (translated)
文档级关系抽取(DocRE)是指在文档中识别和提取跨越多句的关系实体之间的关系。由于其现实性的设定,近年来DocRE吸引了越来越多的研究关注。以往的研究主要集中在开发复杂编码模型以更好地捕捉实体对之间的微妙模式上。尽管这些进展无疑至关重要,但更根本的挑战在于数据本身。DocRE固有的复杂性使得标注过程容易出现错误,并且由于正样本实例数量有限和正关系类型多样性的广泛存在,正关系样本极其稀疏。这些因素会导致优化偏差进一步加剧关系抽取任务的难度。 认识到这些挑战,我们开发了一个名为**COMM(Concentrated Margin Maximization)**的强大框架以更好地解决DocRE问题。**COMM**通过首先采用一种实例感知推理方法来动态捕捉文档中实体对的相关信息并提取关系特征开始工作。随后,**COMM**考虑了关系分布和样本难度,并动态调整预测置信度与决策阈值之间的距离,这一过程称为集中边际最大化(Concentrated Margin Maximization)。通过这种方式,**COMM**不仅增强了相关关系特征的抽取能力,还通过解决数据所特有的挑战而提升了DocRE的表现。广泛的实验分析展示了**COMM**的多功能性和有效性,特别是在使用低质量数据训练时其表现尤为突出(性能提升超过10%)。
URL
https://arxiv.org/abs/2503.13885