Abstract
Opinions in the scientific domain can be divergent, leading to controversy or consensus among reviewers. However, current opinion summarization datasets mostly focus on product review domains, which do not account for this variability under the assumption that the input opinions are non-controversial. To address this gap, we propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. To facilitate this task, we introduce a new ORSUM dataset covering 10,989 paper meta-reviews and 40,903 paper reviews from 39 conferences. Furthermore, we propose the Checklist-guided Iterative Introspection (CGI$^2$) approach, which breaks down the task into several stages and iteratively refines the summary under the guidance of questions from a checklist. We conclude that (1) human-written summaries are not always reliable since many do not follow the guideline, and (2) the combination of task decomposition and iterative self-refinement shows promising discussion involvement ability and can be applied to other complex text generation using black-box LLM.
Abstract (translated)
在科学领域中,观点可以各不相同,导致评论者之间产生争议或共识。然而,当前的观点概括数据集大多关注产品评论领域,而忽视了这一差异,假定输入观点是非争议性的。为了解决这个问题,我们提出了科学观点概括的任务,将研究论文的评论合成为综述。为了便于这项工作,我们介绍了一个新数据集ORSUM,涵盖了39次会议中的10,989篇综述和40,903篇研究论文。此外,我们提出了一个 Checklist- guided Iterative Introspection(CGI$^2$)方法,该方法将任务分解成多个阶段,并使用 Checklist 中的问题来迭代地优化摘要。我们的结论是(1)人类撰写的摘要并不总是可靠的,因为许多人没有遵循指南,(2)任务分解和迭代的自我优化组合显示出良好的参与讨论的能力,并可以用于使用黑盒LLM的其他复杂的文本生成任务。
URL
https://arxiv.org/abs/2305.14647