Abstract
Inquisitive questions -- open-ended, curiosity-driven questions people ask as they read -- are an integral part of discourse processing (Kehler and Rohde, 2017; Onea, 2016) and comprehension (Prince, 2004). Recent work in NLP has taken advantage of question generation capabilities of LLMs to enhance a wide range of applications. But the space of inquisitive questions is vast: many questions can be evoked from a given context. So which of those should be prioritized to find answers? Linguistic theories, unfortunately, have not yet provided an answer to this question. This paper presents QSALIENCE, a salience predictor of inquisitive questions. QSALIENCE is instruction-tuned over our dataset of linguist-annotated salience scores of 1,766 (context, question) pairs. A question scores high on salience if answering it would greatly enhance the understanding of the text (Van Rooy, 2003). We show that highly salient questions are empirically more likely to be answered in the same article, bridging potential questions (Onea, 2016) with Questions Under Discussion (Roberts, 2012). We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
Abstract (translated)
好奇的问题 -- 开放性的、以好奇心为导向的问题,人们在阅读中提出的问题 -- 是语义处理(Kehler和Rohde,2017;Onea,2016)和理解(Prince,2004)的重要组成部分。近年来,自然语言处理(NLP)工作充分利用了大型语言模型的问句生成能力,增强了广泛的应用。但是,好奇的问题的空间是广阔的:可以从给定的上下文中引发许多问题。那么,应该优先考虑哪些问题来寻找答案呢?不幸的是,语言理论尚未回答这个问题。本文介绍了 QSALIENCE,一个好奇问题预测器。QSALIENCE 是通过我们数据集中的1766个(上下文,问题)对进行语言学家标注的语义分数进行指令调整的。问题得分高,如果回答它会大大增强对文本的理解(Van Rooy,2003)。我们证明了,高度耸人听闻的问题在实证上更有可能在相同的文章中被回答,将潜在问题(Onea,2016)与正在讨论的问题(Roberts,2012)联系起来。我们进一步验证了我们的研究结果,通过展示回答耸人听闻的问题是新闻摘要质量的指标,来进一步验证我们的发现。
URL
https://arxiv.org/abs/2404.10917