Paper Reading AI Learner

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

2023-01-28 23:08:25
Potsawee Manakul, Adian Liusie, Mark J. F. Gales

Abstract

State-of-the-art summarization systems can generate highly fluent summaries. These summaries, however, may contain factual inconsistencies and/or information not present in the source. Hence, an important component of assessing the quality of summaries is to determine whether there is information consistency between the source and the summary. Existing approaches are typically based on lexical matching or representation-based methods. In this work, we introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared. We propose a Multiple-choice Question Answering and Generation framework, MQAG, which approximates the information consistency by computing the expected KL-divergence between summary and source answer distributions over automatically generated multiple-choice questions. This approach exploits multiple-choice answer probabilities, as predicted answer distributions can be easily compared. We conduct experiments on four summary evaluation datasets: QAG-CNNDM/XSum, XSum-Faithfulness, Podcast Assessment, and SummEval. Experiments show that MQAG (using models trained on RACE) outperforms existing evaluation methods on the majority of tasks.

Abstract (translated)

先进的摘要生成系统可以生成流畅的摘要。然而,这些摘要可能存在事实不一致和/或信息未在源中呈现的情况。因此,评估摘要质量的一个重要组成部分是确定源和摘要之间的信息一致性。现有的方法通常基于词表匹配或表示方法。在本工作中,我们介绍了一种基于标准信息论 measures 的替代方案,在该方案中,源和摘要之间的信息直接进行比较。我们提出了一个多选题问题回答和生成框架,名为 MQAG,该框架通过计算自动生成的多选题问题源和摘要答案分布的期望KL-差异来近似信息一致性。这种方法利用多选题答案概率,因为预测答案分布可以轻松比较。我们研究了四个摘要评估数据集:QAG-CNNDM/XSum、XSum- Faithfulness、 Podcast评估和SummEval。实验结果表明,MQAG(使用在RACE模型训练集中训练的模型)在大多数任务中比现有的评估方法更有效。

URL

https://arxiv.org/abs/2301.12307

PDF

https://arxiv.org/pdf/2301.12307.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot