Abstract
The proliferation of Large Language Models (LLMs) in real-world applications poses unprecedented risks of generating harmful, biased, or misleading information to vulnerable populations including LGBTQ+ individuals, single parents, and marginalized communities. While existing safety approaches rely on post-hoc filtering or generic alignment techniques, they fail to proactively prevent harmful outputs at the generation source. This paper introduces PromptGuard, a novel modular prompting framework with our breakthrough contribution: VulnGuard Prompt, a hybrid technique that prevents harmful information generation using real-world data-driven contrastive learning. VulnGuard integrates few-shot examples from curated GitHub repositories, ethical chain-of-thought reasoning, and adaptive role-prompting to create population-specific protective barriers. Our framework employs theoretical multi-objective optimization with formal proofs demonstrating 25-30% analytical harm reduction through entropy bounds and Pareto optimality. PromptGuard orchestrates six core modules: Input Classification, VulnGuard Prompting, Ethical Principles Integration, External Tool Interaction, Output Validation, and User-System Interaction, creating an intelligent expert system for real-time harm prevention. We provide comprehensive mathematical formalization including convergence proofs, vulnerability analysis using information theory, and theoretical validation framework using GitHub-sourced datasets, establishing mathematical foundations for systematic empirical research.
Abstract (translated)
大型语言模型(LLMs)在现实世界应用中的普及,给包括LGBTQ+群体、单亲家庭和边缘化社区在内的脆弱人群带来了前所未有的风险,可能生成有害的、有偏见或误导性的信息。虽然现有的安全方法依赖于事后过滤或通用对齐技术,但它们无法从源头上主动防止有害输出。本文介绍了PromptGuard,这是一种新颖的模块化提示框架,并且我们做出了突破性贡献:VulnGuard Prompt,一种使用现实世界数据驱动对比学习来阻止有害信息生成的混合技术。VulnGuard 通过整合来自精心策划的GitHub存储库的少量样本实例、伦理链式思考推理以及自适应角色提示,为特定人群创建保护屏障。 我们的框架采用了理论上的多目标优化,并且正式证明表明,通过熵边界和帕累托最优性分析实现了25-30%的理论危害减少。PromptGuard 调度了六个核心模块:输入分类、VulnGuard 提示、伦理原则整合、外部工具交互、输出验证以及用户系统交互,从而构建了一个用于实时预防伤害的智能专家系统。 我们提供了全面的数学公式化,包括收敛性证明、使用信息理论进行漏洞分析和基于GitHub数据集的理论验证框架,为系统的实证研究奠定了坚实的数学基础。
URL
https://arxiv.org/abs/2509.08910