Abstract
Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models to improve coverage while avoiding introducing factual inconsistencies. Specifically, we use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered, as well as to distinguish between factually consistent and inconsistent generated sentences. Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach in balancing coverage and faithfulness, validated with automatic metrics and human evaluations. Additionally, we compute the correlation between commonly used automatic metrics with human judgments in terms of three different dimensions regarding coverage and factual consistency to provide insight into the most suitable metric for evaluating dialogue summaries.
Abstract (translated)
缺失信息是对话摘要中常见的问题,其中一些参考摘要中的信息并未出现在生成的摘要中。为了解决这一问题,我们建议利用自然语言推断(NLI)模型来提高涵盖度,同时避免引入事实一致性不一致。具体来说,我们使用NLI来计算细致的训练信号,以鼓励模型在参考摘要中生成未被覆盖的内容,并区分事实一致性不一致的生成句子。在DialogSum和Sammsum数据集上进行了实验,证实了我们提出的方法在平衡涵盖度和准确性方面的有效性,并经过自动指标和人类评估的验证。此外,我们还计算了常用的自动指标与人类判断之间的相关性,以了解评估对话摘要时最适当的指标。
URL
https://arxiv.org/abs/2301.10483