Abstract
Temporal information extraction from unstructured text is essential for contextualizing events and deriving actionable insights, particularly in the medical domain. We address the task of extracting clinical events and their temporal relations using the well-studied I2B2 2012 Temporal Relations Challenge corpus. This task is inherently challenging due to complex clinical language, long documents, and sparse annotations. We introduce GRAPHTREX, a novel method integrating span-based entity-relation extraction, clinical large pre-trained language models (LPLMs), and Heterogeneous Graph Transformers (HGT) to capture local and global dependencies. Our HGT component facilitates information propagation across the document through innovative global landmarks that bridge distant entities. Our method improves the state-of-the-art with 5.5% improvement in the tempeval $F_1$ score over the previous best and up to 8.9% improvement on long-range relations, which presents a formidable challenge. This work not only advances temporal information extraction but also lays the groundwork for improved diagnostic and prognostic models through enhanced temporal reasoning.
Abstract (translated)
从非结构化文本中提取时间信息对于事件的语境化和获取可操作见解至关重要,特别是在医疗领域。我们通过研究广泛使用的I2B2 2012年时间关系挑战数据集来解决临床事件及其时间关系抽取的任务。由于复杂的医学语言、长文档以及稀疏标注的存在,这一任务本身具有相当大的挑战性。 为此,我们引入了一种新的方法——GRAPHTREX,该方法结合了基于跨度的实体-关系提取、大型预训练的语言模型(LPLMs)和异构图变换器(HGT),以捕捉局部与全局依赖关系。我们的HGT组件通过创新性的全局地标来促进文档中的信息传播,这些地标能够连接远距离的实体。 我们提出的方法显著提升了现有技术水平,在tempeval $F_1$分数上比之前的最佳方法提高了5.5%,在长程关系提取方面最多提高了8.9%。这种改进对于解决长程关系这一重大挑战尤其重要。 这项工作不仅推进了时间信息抽取技术的发展,还为通过增强的时间推理能力来改善诊断和预后模型奠定了基础。
URL
https://arxiv.org/abs/2503.18085