Paper Reading AI Learner

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

2025-10-01 07:07:22
Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen

Abstract

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable Attention Attractors and Focus Regions. Attractors are optimized to direct attention to the Focus Region. Attackers can then insert semantic baits for the retriever or malicious instructions for the generator, adapting to new targets at near zero cost. This is achieved by steering a small subset of attention heads that we empirically identify as strongly correlated with attack success. Across 18 end-to-end RAG settings (3 datasets $\times$ 2 retrievers $\times$ 3 generators), Eyes-on-Me raises average attack success rates from 21.9 to 57.8 (+35.9 points, 2.6$\times$ over prior work). A single optimized attractor transfers to unseen black box retrievers and generators without retraining. Our findings establish a scalable paradigm for RAG data poisoning and show that modular, reusable components pose a practical threat to modern AI systems. They also reveal a strong link between attention concentration and model outputs, informing interpretability research.

Abstract (translated)

现有的针对检索增强生成(RAG)系统的数据投毒攻击扩展性较差,因为它们需要为每个目标短语对中毒文档进行昂贵的优化。我们引入了Eyes-on-Me模块化攻击方法,将对抗性文档分解成可重复使用的注意力吸引器和焦点区域。吸引力被优化以引导注意指向焦点区域。攻击者可以随后插入检索器或生成器的语义诱饵或恶意指令,几乎无需成本即可针对新目标进行调整。这是通过操控我们实证识别出与攻击成功率强相关的少量注意力头来实现的。 在18种端到端RAG设置中(3个数据集 × 2个检索器 × 3个生成器),Eyes-on-Me将平均攻击成功率从21.9提高到了57.8(增加了35.9个百分点,为先前工作的2.6倍)。一个经过优化的吸引器可以在没有重新训练的情况下迁移到未知的黑盒检索器和生成器。我们的研究成果建立了一种针对RAG数据投毒可扩展的方法,并表明模块化、可重复使用的组件对现代AI系统构成了实际威胁。此外,它们还揭示了注意力集中度与模型输出之间的强烈联系,这为解释性研究提供了信息。

URL

https://arxiv.org/abs/2510.00586

PDF

https://arxiv.org/pdf/2510.00586.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot