Paper Reading AI Learner

The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality

2025-07-11 07:34:34
Benjamin Newman, Abhilasha Ravichander, Jaehun Jung, Rui Xin, Hamish Ivison, Yegor Kuznetsov, Pang Wei Koh, Yejin Choi

Abstract

Language models are prone to hallucination - generating text that is factually incorrect. Finetuning models on high-quality factual information can potentially reduce hallucination, but concerns remain; obtaining factual gold data can be expensive and training on correct but unfamiliar data may potentially lead to even more downstream hallucination. What data should practitioners finetune on to mitigate hallucinations in language models? In this work, we study the relationship between the factuality of finetuning data and the prevalence of hallucinations in long-form generation tasks. Counterintuitively, we find that finetuning on factual gold data is not as helpful as finetuning on model-generated data that models believe to be factual. Next, we evaluate filtering strategies applied on both factual gold data and model-generated data, and find that finetuning on model-generated data that is filtered by models' own internal judgments often leads to better overall factuality compared to other configurations: training on gold data filtered by models' judgments, training on gold data alone, or training on model-generated data that is supported by gold data. These factuality improvements transfer across three domains we study, suggesting that a models' own beliefs can provide a powerful signal for factuality.

Abstract (translated)

语言模型容易产生幻觉,即生成不准确的事实性文本。在高质量事实信息上对这些模型进行微调可以潜在地减少这种幻觉,但存在一些顾虑:获得高质量的“黄金”数据可能成本高昂,并且在正确但陌生的数据上训练可能导致更多的下游幻觉问题。那么实践者应该使用什么样的数据来减少语言模型中的幻觉呢? 在这项工作中,我们研究了用于微调的事实性数据与长文本生成任务中幻觉出现频率之间的关系。出人意料的是,我们发现使用模型自动生成并认为正确的事实性“黄金”数据进行微调比直接使用纯正的“黄金”数据更有效。接下来,我们评估了在两种类型的数据上应用的各种过滤策略(即纯正的“黄金”数据和模型生成的数据),发现在经过自己内部判断筛选后的模型生成数据上训练通常会带来整体事实性的改进,优于其他配置:用模型判断过滤过的“黄金”数据、单独使用未处理的“黄金”数据或者基于“黄金”支持的模型生成数据进行微调。这些事实性改进在我们研究的三个领域中都有所体现,表明模型自身的信念可以提供强大的信号来改善文本的事实准确性。

URL

https://arxiv.org/abs/2507.08371

PDF

https://arxiv.org/pdf/2507.08371.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot