Abstract
Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0% precision and 80.5% recall on the sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.
Abstract (translated)
大型语言模型(大型LM)容易生成具有幻觉 content 的文本。自相矛盾是指在相同的上下文中生成两个互相矛盾的语句,是一种重要的幻觉形式。在本研究中,我们对各种先进的指令调整大型LM的自相矛盾进行了全面分析,包括评估、检测和缓解。要有效地触发自相矛盾,我们设计了一个框架,限制LM生成适当的语句对。我们的评估表明,对话生成模型(GPT)和Vicuna-13B在这些语句对中经常检测到自相矛盾。接下来,我们促使LM检测自相矛盾。我们的结果显示,ChatGPT和GPT-4能够准确地识别自相矛盾,而Vicuna-13B则 struggles。例如,我们的最佳促使方法使ChatGPT在其自身生成的语句对中获得了91.0%的精度和80.5%的召回。为了自动缓解自相矛盾,我们开发了一种迭代算法,促使LM从生成的文本中删除检测到的自相矛盾。我们的算法成功地修改了文本,使自相矛盾得到显著减少,同时保持了其流畅性和信息性。重要的是,我们整个触发、检测和缓解自相矛盾的流程适用于黑盒大型LM,并不需要任何外部基础知识。
URL
https://arxiv.org/abs/2305.15852