Multi-Document Summarization with Centroid-Based Pretraining

2022-08-01 17:28:02

Ratish Puduppully, Mark Steedman

arXiv_CL

arXiv_CL Summarization Zero-Shot

Abstract
Abstract (translated)
URL
PDF

Abstract

In multi-document summarization (MDS), the input is a cluster of documents, and the output is the cluster summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a simple pretraining objective of choosing the ROUGE-based centroid of each document cluster as a proxy for its summary. Our objective thus does not require human written summaries and can be used for pretraining on a dataset containing only clusters of documents. Through zero-shot and fully supervised experiments on multiple MDS datasets, we show that our model Centrum is better or comparable to a state-of-the-art model. We release our pretrained and finetuned models at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2208.01006

PDF

https://arxiv.org/pdf/2208.01006.pdf