Abstract
Personalizing large language models (LLMs) to individual users requires incorporating extensive interaction histories and profiles, but input token constraints make this impractical due to high inference latency and API costs. Existing approaches rely on heuristic methods such as selecting recent interactions or prompting summarization models to compress user profiles. However, these methods treat context as a monolithic whole and fail to consider how LLMs internally process and prioritize different profile components. We investigate whether LLMs' attention patterns can effectively identify important personalization signals for intelligent context compression. Through preliminary studies on representative personalization tasks, we discover that (a) LLMs' attention patterns naturally reveal important signals, and (b) fine-tuning enhances LLMs' ability to distinguish between relevant and irrelevant information. Based on these insights, we propose Attn-GS, an attention-guided context compression framework that leverages attention feedback from a marking model to mark important personalization sentences, then guides a compression model to generate task-relevant, high-quality compressed user contexts. Extensive experiments demonstrate that Attn-GS significantly outperforms various baselines across different tasks, token limits, and settings, achieving performance close to using full context while reducing token usage by 50 times.
Abstract (translated)
将大型语言模型(LLM)个性化以适应个别用户需要整合详细的交互历史和档案信息,但由于推理延迟高和API成本高昂的问题,输入标记限制使得这种方法在实际操作中难以实现。现有的方法主要依赖于诸如选择最近的互动或是提示总结模型来压缩用户档案等启发式手段。然而,这些方法将上下文视为一个整体,并未考虑到LLM内部如何处理和优先化不同类型的个人资料信息。 我们研究了LLM的关注模式是否可以有效地识别重要的人格化信号,从而实现智能的上下文压缩。通过在代表性的个性化任务中的初步研究中发现:(a)LLM的关注模式自然揭示出了重要的信号;(b)微调能够提升模型区分相关信息和无关信息的能力。 基于这些见解,我们提出了一种新的框架Attn-GS,这是一种以注意力引导的上下文压缩方案。该方法利用标记模型提供的注意反馈来标出重要的人格化句子,并指导压缩模型生成与任务相关、高质量且经过压缩后的用户上下文信息。 广泛的实验表明,在不同的任务、令牌限制和设置下,Attn-GS框架显著优于各种基线模型,它能够在减少50倍的令牌使用量的情况下,接近于使用完整上下文时的表现水平。
URL
https://arxiv.org/abs/2602.07778