Abstract
This study examines the potential of integrating Learning-to-Rank (LTR) with Query-focused Summarization (QFS) to enhance the summary relevance via content prioritization. Using a shared secondary decoder with the summarization decoder, we carry out the LTR task at the segment level. Compared to the state-of-the-art, our model outperforms on QMSum benchmark (all metrics) and matches on SQuALITY benchmark (2 metrics) as measured by Rouge and BertScore while offering a lower training overhead. Specifically, on the QMSum benchmark, our proposed system achieves improvements, particularly in Rouge-L (+0.42) and BertScore (+0.34), indicating enhanced understanding and relevance. While facing minor challenges in Rouge-1 and Rouge-2 scores on the SQuALITY benchmark, the model significantly excels in Rouge-L (+1.47), underscoring its capability to generate coherent summaries. Human evaluations emphasize the efficacy of our method in terms of relevance and faithfulness of the generated summaries, without sacrificing fluency. A deeper analysis reveals our model's superiority over the state-of-the-art for broad queries, as opposed to specific ones, from a qualitative standpoint. We further present an error analysis of our model, pinpointing challenges faced and suggesting potential directions for future research in this field.
Abstract (translated)
这项研究探讨了将学习排序(LTR)与以查询为中心的摘要生成(QFS)相结合的潜力,通过内容优先级来提高摘要的相关性。使用一个与摘要解码器共享的次级解码器,我们在段落层面执行LTR任务。相比现有最先进的技术,我们的模型在QMSum基准测试中所有指标上均表现更优,并且在SQuALITY基准测试中的两个指标上持平,这些评估采用了Rouge和BertScore。特别地,在QMSum基准测试中,我们提出的系统实现了显著的改进,特别是在Rouge-L(+0.42)和BertScore(+0.34)方面,这表明了理解能力和相关性的增强。尽管在SQuALITY基准测试中的Rouge-1和Rouge-2得分上遇到了一些小挑战,但在Rouge-L(+1.47)上有显著提升,凸显其生成连贯摘要的能力。人工评估强调了我们方法在生成的摘要的相关性和忠实性方面的有效性,同时没有牺牲流畅度。深入分析表明,在广泛查询方面,我们的模型从定性的角度来看优于最先进的技术,而不是针对特定查询。此外,我们还对模型进行了错误分析,指出了所面临的挑战,并提出了该领域未来研究的潜在方向。
URL
https://arxiv.org/abs/2411.00324