Paper Reading AI Learner

Do Multi-Document Summarization Models Synthesize?

2023-01-31 18:40:46
Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace

Abstract

Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately \emph{synthesize} inputs with respect to a key property or aspect. For example, a synopsis of film reviews all written about a particular movie should reflect the average critic consensus. As a more consequential example, consider narrative summaries that accompany biomedical \emph{systematic reviews} of clinical trial results. These narratives should fairly summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this type of synthesis? To assess this we perform a suite of experiments that probe the degree to which conditional generation models trained for summarization using standard methods yield outputs that appropriately synthesize inputs. We find that existing models do partially perform synthesis, but do so imperfectly. In particular, they are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., the ratio of positive to negative movie reviews). We propose a simple, general method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or \emph{abstaining} when the model produces no good candidate. This approach improves model synthesis performance. We hope highlighting the need for synthesis (in some summarization settings), motivates further research into multi-document summarization methods and learning objectives that explicitly account for the need to synthesize.

Abstract (translated)

多文档摘要包括对一组输入的简明摘要。在一些应用中,摘要应该准确地对关键属性或方面进行合成。例如,所有关于某一电影的评论的摘要都应该反映平均评论家的共识。作为更严重的后果,可以考虑跟踪医学背景下对临床试验结果的系统综述,这些综述应该公正地总结每个试验可能产生的互相冲突的结果。在本文中,我们问:现代多文档摘要模型是否在一定程度上隐含了这种类型的合成?为了评估这个问题,我们进行了一组实验,探究使用标准方法训练条件生成模型是否能够产生适当的合成输出。我们发现,现有模型在一定程度上实现了合成,但不完美。特别是,它们对输入排序变化的反应过度,对输入组合变化的反应不足(例如,正面电影评论的比率)。我们提出了一种简单而通用的方法,通过生成一个明确多样化的备选输出序列来提高模型合成能力。然后从这些序列中选择与输入期望的总平方量最匹配的字符串,或如果模型没有好的例子,则“拒绝”选择。这种方法提高了模型合成性能。我们希望强调合成的必要性(在某些摘要设置中),激励进一步研究多文档摘要方法以及明确考虑合成必要性的学习目标和任务。

URL

https://arxiv.org/abs/2301.13844

PDF

https://arxiv.org/pdf/2301.13844.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot