Abstract
Language models are prone to hallucination - generating text that is factually incorrect. Finetuning models on high-quality factual information can potentially reduce hallucination, but concerns remain; obtaining factual gold data can be expensive and training on correct but unfamiliar data may potentially lead to even more downstream hallucination. What data should practitioners finetune on to mitigate hallucinations in language models? In this work, we study the relationship between the factuality of finetuning data and the prevalence of hallucinations in long-form generation tasks. Counterintuitively, we find that finetuning on factual gold data is not as helpful as finetuning on model-generated data that models believe to be factual. Next, we evaluate filtering strategies applied on both factual gold data and model-generated data, and find that finetuning on model-generated data that is filtered by models' own internal judgments often leads to better overall factuality compared to other configurations: training on gold data filtered by models' judgments, training on gold data alone, or training on model-generated data that is supported by gold data. These factuality improvements transfer across three domains we study, suggesting that a models' own beliefs can provide a powerful signal for factuality.
Abstract (translated)
语言模型容易产生幻觉,即生成不准确的事实性文本。在高质量事实信息上对这些模型进行微调可以潜在地减少这种幻觉,但存在一些顾虑:获得高质量的“黄金”数据可能成本高昂,并且在正确但陌生的数据上训练可能导致更多的下游幻觉问题。那么实践者应该使用什么样的数据来减少语言模型中的幻觉呢? 在这项工作中,我们研究了用于微调的事实性数据与长文本生成任务中幻觉出现频率之间的关系。出人意料的是,我们发现使用模型自动生成并认为正确的事实性“黄金”数据进行微调比直接使用纯正的“黄金”数据更有效。接下来,我们评估了在两种类型的数据上应用的各种过滤策略(即纯正的“黄金”数据和模型生成的数据),发现在经过自己内部判断筛选后的模型生成数据上训练通常会带来整体事实性的改进,优于其他配置:用模型判断过滤过的“黄金”数据、单独使用未处理的“黄金”数据或者基于“黄金”支持的模型生成数据进行微调。这些事实性改进在我们研究的三个领域中都有所体现,表明模型自身的信念可以提供强大的信号来改善文本的事实准确性。
URL
https://arxiv.org/abs/2507.08371