Paper Reading AI Learner

Decoupled Reasoning with Implicit Fact Tokens : A Dual-Model Framework for Efficient Long-Context Inference

2026-02-10 17:42:31
Wenxuan Xie, Yujia Wang, Xin Tan, Chaochao Lu, Xia Hu, Xuhong Wang

Abstract

The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at this https URL.

Abstract (translated)

将广泛而动态的知识融入大型语言模型(LLMs)中仍然是一个重大挑战,原因是事实数据与推理模式之间的内在纠缠。现有的解决方案从非参数增强检索的生成方法(RAG)到参数化知识编辑不等,但在实践中往往受限于有限的上下文窗口、检索噪音或灾难性遗忘的风险。在本文中,我们提出了DRIFT,这是一种新颖的双模型架构,旨在明确地将知识提取与推理过程分离。不同于静态提示压缩,DRIFT采用了一个轻量级的知识模型来根据查询动态地将文档片段压缩成隐含的事实标记。这些密集表示被投影到推理模型的嵌入空间中,在维持推理准确度的同时替换了原始冗余文本。大量的实验表明,DRIFT在处理长上下文任务时显著提高了性能,并且在大小相仿的模型中超越了强大的基线方法。我们的方法提供了一个可扩展和高效的范式,用于扩大LLMs的有效上下文窗口及推理能力。代码可在该链接获取。

URL

https://arxiv.org/abs/2602.10021

PDF

https://arxiv.org/pdf/2602.10021.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot