Abstract
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM-based applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario in which participants adaptively attempted to inject malicious instructions into emails in order to trigger unauthorized tool calls in an LLM-based email assistant. The challenge spanned multiple defense strategies, LLM architectures, and retrieval configurations, resulting in a dataset of 208,095 unique attack submissions from 839 participants. We release the challenge code, the full dataset of submissions, and our analysis demonstrating how this data can provide new insights into the instruction-data separation problem. We hope this will serve as a foundation for future research towards practical structural solutions to prompt injection.
Abstract (translated)
间接提示注入攻击利用了大型语言模型(LLM)在区分输入中的指令和数据方面的内在限制。尽管提出了许多防御建议,但针对适应性对手的系统评估仍有限制,即使成功的攻击可能具有广泛的安保和隐私影响,许多基于LLM的真实世界应用程序仍然易受攻击。我们展示了LLMail-Inject的结果,这是一项公开挑战,模拟了一个现实场景,在该场景中参与者尝试将恶意指令注入电子邮件以触发LLM基础邮件助手的未经授权的工具调用。此挑战涵盖了多种防御策略、LLM架构和检索配置,并产生了来自839名参与者的208,095个独特攻击提交的数据集。我们发布了挑战代码、完整的提交数据集以及我们的分析,展示如何利用这些数据为指令与数据分离问题提供新的见解。我们希望这将成为未来研究的基础,以寻求解决提示注入的实用结构化解决方案。
URL
https://arxiv.org/abs/2506.09956