Abstract
Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such do not capture the diversity of relationships in the documents. To preserve only key information and relationships of the documents in the heterogeneous graph, HGSUM uses graph pooling to compress the input graph. And to guide HGSUM to learn compression, we introduce an additional objective that maximizes the similarity between the compressed graph and the graph constructed from the ground-truth summary during training. HGSUM is trained end-to-end with graph similarity and standard cross-entropy objectives. Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. The code for our model and experiments is available at: this https URL.
Abstract (translated)
多文档摘要(MDS)的目标是为多个相关文档生成摘要。我们提出了HGSUM,一个扩展编码-解码架构的MDS模型,以引入一种 heterogeneous graph 来代表文档的不同语义单元(例如,单词和句子)。这与现有的MDS模型不同,它们不考虑不同的图形边类型,因此无法捕捉文档中关系的多样性。为了在 heterogeneous graph 中保留文档的关键信息和关系,HGSUM使用图聚合来压缩输入图。为了指导HGSUM学习压缩,我们引入了另一个目标,该目标最大化训练期间压缩图与实际摘要生成图之间的相似性。HGSUM以图形相似性和标准交叉熵目标进行训练。在Multi-NEWS、WCEP-100和ARXIV的实验结果中,表明HGSUM比最先进的MDS模型表现更好。我们的模型和实验代码可在以下httpsURL获得。
URL
https://arxiv.org/abs/2303.06565