Paper Reading AI Learner

A Call to Action for a Secure-by-Design Generative AI Paradigm

2025-10-01 03:05:07
Dalal Alharthi, Ivan Roberto Kawaminami Garcia

Abstract

Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical concern. This paper argues for a security-by-design AI paradigm that proactively mitigates LLM vulnerabilities while enhancing performance. To achieve this, we introduce PromptShield, an ontology-driven framework that ensures deterministic and secure prompt interactions. It standardizes user inputs through semantic validation, eliminating ambiguity and mitigating adversarial manipulation. To assess PromptShield's security and performance capabilities, we conducted an experiment on an agent-based system to analyze cloud logs within Amazon Web Services (AWS), containing 493 distinct events related to malicious activities and anomalies. By simulating prompt injection attacks and assessing the impact of deploying PromptShield, our results demonstrate a significant improvement in model security and performance, achieving precision, recall, and F1 scores of approximately 94%. Notably, the ontology-based framework not only mitigates adversarial threats but also enhances the overall performance and reliability of the system. Furthermore, PromptShield's modular and adaptable design ensures its applicability beyond cloud security, making it a robust solution for safeguarding generative AI applications across various domains. By laying the groundwork for AI safety standards and informing future policy development, this work stimulates a crucial dialogue on the pivotal role of deterministic prompt engineering and ontology-based validation in ensuring the safe and responsible deployment of LLMs in high-stakes environments.

Abstract (translated)

大型语言模型已经广受关注,但它们易受到提示注入和其他对抗性攻击的威胁仍然是一个关键问题。本文提出了一种“设计即安全”的AI范式,旨在主动减轻LLM(大型语言模型)的安全漏洞并提升其性能。为此,我们引入了PromptShield框架,该框架基于本体论驱动的设计确保了确定性和安全性提示交互。通过语义验证标准化用户输入,PromptShield消除了歧义,并减少了对抗性操纵的可能性。 为了评估PromptShield的安全性和性能能力,我们在一个代理系统中进行了实验,以分析Amazon Web Services (AWS)的云日志数据集。该数据集中包含了493个与恶意活动和异常行为相关的不同事件。通过模拟提示注入攻击并评估部署PromptShield的影响,我们的结果表明,在模型安全性和性能方面取得了显著改进,精确度、召回率和F1分数分别达到了约94%。 值得注意的是,基于本体论的框架不仅减轻了对抗性威胁,还提高了系统的整体性能和可靠性。此外,由于其模块化和可适应的设计,PromptShield不仅适用于云安全领域,在各种不同应用场景中也能作为保护生成式AI应用的安全解决方案。 这项工作为AI安全性标准奠定了基础,并且通过影响未来的政策制定,它促进了关于在高风险环境中确保LLM(大型语言模型)安全且负责任地部署的重要讨论。特别强调了确定性提示工程和基于本体论的验证在这一过程中的核心作用。

URL

https://arxiv.org/abs/2510.00451

PDF

https://arxiv.org/pdf/2510.00451.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot