Abstract
The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at this https URL.
Abstract (translated)
将广泛而动态的知识融入大型语言模型(LLMs)中仍然是一个重大挑战,原因是事实数据与推理模式之间的内在纠缠。现有的解决方案从非参数增强检索的生成方法(RAG)到参数化知识编辑不等,但在实践中往往受限于有限的上下文窗口、检索噪音或灾难性遗忘的风险。在本文中,我们提出了DRIFT,这是一种新颖的双模型架构,旨在明确地将知识提取与推理过程分离。不同于静态提示压缩,DRIFT采用了一个轻量级的知识模型来根据查询动态地将文档片段压缩成隐含的事实标记。这些密集表示被投影到推理模型的嵌入空间中,在维持推理准确度的同时替换了原始冗余文本。大量的实验表明,DRIFT在处理长上下文任务时显著提高了性能,并且在大小相仿的模型中超越了强大的基线方法。我们的方法提供了一个可扩展和高效的范式,用于扩大LLMs的有效上下文窗口及推理能力。代码可在该链接获取。
URL
https://arxiv.org/abs/2602.10021