Abstract
Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.
Abstract (translated)
概括比较性意见的翻译(例如,酒店,电话)从一个评论集的来源中进行,通常称为对比性总结,可以帮助用户在决策过程中做出重大贡献。然而,在没有人类评价的基础上可靠地测量输出摘要的对比性仍然是一个开放问题。之前的工作已经提出了基于词重叠的度量标准, distinctiveness score,来衡量对比,但并没有考虑到意义保持词形变化的敏感性。在这项工作中,我们提出了一种自动评估指标CASPR,以更好地衡量摘要对摘要的对比。我们的指标基于一种简单而轻量级的方法,利用自然语言推理(NLI)任务来测量对比,通过将评论分为单个主张句并仔细聚合NLI得分来得出总结级别得分。我们比较CASPR与Distinctiveness Score和基于BERTScore的一种简单而强大的基线。我们在CoCoTRIP先前数据集上的结果表明,CASPR比基线更可靠地捕捉摘要对摘要的对比性。
URL
https://arxiv.org/abs/2404.15565