Abstract
Retrieval-augmented generation (RAG) has become a common strategy for updating large language model (LLM) responses with current, external information. However, models may still rely on memorized training data, bypass the retrieved evidence, and produce contaminated outputs. We introduce Retrieval-Path Contamination Scoring (RePCS), a diagnostic method that detects such behavior without requiring model access or retraining. RePCS compares two inference paths: (i) a parametric path using only the query, and (ii) a retrieval-augmented path using both the query and retrieved context by computing the Kullback-Leibler (KL) divergence between their output distributions. A low divergence suggests that the retrieved context had minimal impact, indicating potential memorization. This procedure is model-agnostic, requires no gradient or internal state access, and adds only a single additional forward pass. We further derive PAC-style guarantees that link the KL threshold to user-defined false positive and false negative rates. On the Prompt-WNQA benchmark, RePCS achieves a ROC-AUC of 0.918. This result outperforms the strongest prior method by 6.5 percentage points while keeping latency overhead below 4.7% on an NVIDIA T4 GPU. RePCS offers a lightweight, black-box safeguard to verify whether a RAG system meaningfully leverages retrieval, making it especially valuable in safety-critical applications.
Abstract (translated)
检索增强生成(RAG)已成为使用当前的外部信息更新大型语言模型(LLM)响应的一种常见策略。然而,模型可能仍然依赖于记忆中的训练数据,忽略检索到的证据,并产生污染输出。我们引入了检索路径污染评分(RePCS),这是一种诊断方法,可以在不访问模型或重新训练的情况下检测此类行为。RePCS 比较两种推理路径:(i) 仅使用查询的参数化路径;以及 (ii) 使用查询和检索上下文的检索增强路径。通过计算这两种输出分布之间的Kullback-Leibler(KL)散度,可以发现如果散度较低,则表明检索到的上下文影响较小,这可能意味着模型存在记忆化的倾向。 这种程序与模型无关,不需要访问梯度或内部状态,并且仅需额外进行一次前向传递。我们进一步推导出PAC风格的保证,将KL阈值链接到用户定义的误报率和漏报率。在Prompt-WNQA基准测试上,RePCS 达到了0.918 的ROC-AUC值。这一结果比最强的先前方法高出6.5个百分点,并且在NVIDIA T4 GPU上的延迟开销低于4.7%。RePCS 提供了一种轻量级、黑盒保障措施,用于验证RAG系统是否有意义地利用了检索功能,在安全关键的应用程序中尤为有价值。
URL
https://arxiv.org/abs/2506.15513