Abstract
Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical concern. This paper argues for a security-by-design AI paradigm that proactively mitigates LLM vulnerabilities while enhancing performance. To achieve this, we introduce PromptShield, an ontology-driven framework that ensures deterministic and secure prompt interactions. It standardizes user inputs through semantic validation, eliminating ambiguity and mitigating adversarial manipulation. To assess PromptShield's security and performance capabilities, we conducted an experiment on an agent-based system to analyze cloud logs within Amazon Web Services (AWS), containing 493 distinct events related to malicious activities and anomalies. By simulating prompt injection attacks and assessing the impact of deploying PromptShield, our results demonstrate a significant improvement in model security and performance, achieving precision, recall, and F1 scores of approximately 94%. Notably, the ontology-based framework not only mitigates adversarial threats but also enhances the overall performance and reliability of the system. Furthermore, PromptShield's modular and adaptable design ensures its applicability beyond cloud security, making it a robust solution for safeguarding generative AI applications across various domains. By laying the groundwork for AI safety standards and informing future policy development, this work stimulates a crucial dialogue on the pivotal role of deterministic prompt engineering and ontology-based validation in ensuring the safe and responsible deployment of LLMs in high-stakes environments.
Abstract (translated)
大型语言模型已经广受关注,但它们易受到提示注入和其他对抗性攻击的威胁仍然是一个关键问题。本文提出了一种“设计即安全”的AI范式,旨在主动减轻LLM(大型语言模型)的安全漏洞并提升其性能。为此,我们引入了PromptShield框架,该框架基于本体论驱动的设计确保了确定性和安全性提示交互。通过语义验证标准化用户输入,PromptShield消除了歧义,并减少了对抗性操纵的可能性。 为了评估PromptShield的安全性和性能能力,我们在一个代理系统中进行了实验,以分析Amazon Web Services (AWS)的云日志数据集。该数据集中包含了493个与恶意活动和异常行为相关的不同事件。通过模拟提示注入攻击并评估部署PromptShield的影响,我们的结果表明,在模型安全性和性能方面取得了显著改进,精确度、召回率和F1分数分别达到了约94%。 值得注意的是,基于本体论的框架不仅减轻了对抗性威胁,还提高了系统的整体性能和可靠性。此外,由于其模块化和可适应的设计,PromptShield不仅适用于云安全领域,在各种不同应用场景中也能作为保护生成式AI应用的安全解决方案。 这项工作为AI安全性标准奠定了基础,并且通过影响未来的政策制定,它促进了关于在高风险环境中确保LLM(大型语言模型)安全且负责任地部署的重要讨论。特别强调了确定性提示工程和基于本体论的验证在这一过程中的核心作用。
URL
https://arxiv.org/abs/2510.00451