Abstract
This paper introduces OARelatedWork, the first large-scale multi-document summarization dataset for related work generation containing whole related work sections and full-texts of cited papers. The dataset includes 94 450 papers and 5 824 689 unique referenced papers. It was designed for the task of automatically generating related work to shift the field toward generating entire related work sections from all available content instead of generating parts of related work sections from abstracts only, which is the current mainstream in this field for abstractive approaches. We show that the estimated upper bound for extractive summarization increases by 217% in the ROUGE-2 score, when using full content instead of abstracts. Furthermore, we show the benefits of full content data on naive, oracle, traditional, and transformer-based baselines. Long outputs, such as related work sections, pose challenges for automatic evaluation metrics like BERTScore due to their limited input length. We tackle this issue by proposing and evaluating a meta-metric using BERTScore. Despite operating on smaller blocks, we show this meta-metric correlates with human judgment, comparably to the original BERTScore.
Abstract (translated)
本文介绍了OARelatedWork,第一个大型多文档相关工作生成数据集,包含整个相关工作段落和引用论文的完整文本。该数据集包括94,450篇论文和5,824,689篇唯一引用的论文。这个数据集是为自动生成相关工作来改变领域,从仅从摘要中生成相关工作段落转向从所有可用内容生成整个相关工作段落而设计的。我们证明了,当使用完整内容而不是摘要时,估计的上限增加了217%。此外,我们还证明了完整内容数据在自然、预言、传统和Transformer基线上的优势。由于其有限输入长度,长输出(如相关工作段落)对自动评估指标BERTScore造成了挑战。我们通过使用BERTScore提出并评估了一个元数据。尽管操作在较小的块上,但我们证明了这种元数据与人类判断相当相关,与原始BERTScore相当。
URL
https://arxiv.org/abs/2405.01930