Paper Reading AI Learner

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

2023-05-25 08:43:46
Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev

Abstract

Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0% precision and 80.5% recall on the sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.

Abstract (translated)

大型语言模型(大型LM)容易生成具有幻觉 content 的文本。自相矛盾是指在相同的上下文中生成两个互相矛盾的语句,是一种重要的幻觉形式。在本研究中,我们对各种先进的指令调整大型LM的自相矛盾进行了全面分析,包括评估、检测和缓解。要有效地触发自相矛盾,我们设计了一个框架,限制LM生成适当的语句对。我们的评估表明,对话生成模型(GPT)和Vicuna-13B在这些语句对中经常检测到自相矛盾。接下来,我们促使LM检测自相矛盾。我们的结果显示,ChatGPT和GPT-4能够准确地识别自相矛盾,而Vicuna-13B则 struggles。例如,我们的最佳促使方法使ChatGPT在其自身生成的语句对中获得了91.0%的精度和80.5%的召回。为了自动缓解自相矛盾,我们开发了一种迭代算法,促使LM从生成的文本中删除检测到的自相矛盾。我们的算法成功地修改了文本,使自相矛盾得到显著减少,同时保持了其流畅性和信息性。重要的是,我们整个触发、检测和缓解自相矛盾的流程适用于黑盒大型LM,并不需要任何外部基础知识。

URL

https://arxiv.org/abs/2305.15852

PDF

https://arxiv.org/pdf/2305.15852.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot