Abstract
Ensuring contextual faithfulness in retrieval-augmented large language models (LLMs) is crucial for building trustworthy information-seeking systems, particularly in long-form question-answering (LFQA) scenarios. In this work, we identify a salient correlation between LFQA faithfulness and retrieval heads, a set of attention heads responsible for retrieving contextual information. Leveraging this insight, we propose RHIO, a framework designed to teach LLMs to explicitly discriminate between faithful and unfaithful generations. RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads. Then, these samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens. Furthermore, these control tokens are leveraged to self-induce contrastive outputs, amplifying their difference through contrastive decoding. Additionally, to facilitate the evaluation of contextual faithfulness, we also introduce GroundBench, a comprehensive benchmark compiled from five existing LFQA datasets. Extensive experimental results on GroundBench demonstrate that RHIO significantly improves faithfulness, even outperforming GPT-4o.
Abstract (translated)
确保增强型大型语言模型(LLM)在检索扩充场景下的语境忠实性对于构建值得信赖的信息查询系统至关重要,特别是在长篇问答(LFQA)场景中。在这项工作中,我们发现了一个显著的关联:LFQA的忠实性和负责检索上下文信息的一组注意力头(即检索头部)之间存在密切联系。基于这一洞察,我们提出了RHIO框架,旨在教导LLM明确区分忠实生成和不忠实生成。 RHIO首先通过选择性地屏蔽检索头部来增强模拟真实模型内在错误的不忠实样本。然后将这些样本纳入联合训练中,使模型能够在控制令牌的条件下区分出不忠实输出和忠实输出。此外,利用这些控制令牌进行自我诱导对比输出,并通过对比解码放大它们之间的差异。 为了便于评估语境忠实性,我们还引入了GroundBench,这是一个综合基准测试,由五个现有的LFQA数据集编译而成。在GroundBench上的广泛实验结果表明,RHIO显著提高了忠实度,甚至超过了GPT-4o的表现。
URL
https://arxiv.org/abs/2501.13573