Paper Reading AI Learner

PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability

2025-09-10 18:14:52
Tung Vu, Lam Nguyen, Quynh Dao

Abstract

The proliferation of Large Language Models (LLMs) in real-world applications poses unprecedented risks of generating harmful, biased, or misleading information to vulnerable populations including LGBTQ+ individuals, single parents, and marginalized communities. While existing safety approaches rely on post-hoc filtering or generic alignment techniques, they fail to proactively prevent harmful outputs at the generation source. This paper introduces PromptGuard, a novel modular prompting framework with our breakthrough contribution: VulnGuard Prompt, a hybrid technique that prevents harmful information generation using real-world data-driven contrastive learning. VulnGuard integrates few-shot examples from curated GitHub repositories, ethical chain-of-thought reasoning, and adaptive role-prompting to create population-specific protective barriers. Our framework employs theoretical multi-objective optimization with formal proofs demonstrating 25-30% analytical harm reduction through entropy bounds and Pareto optimality. PromptGuard orchestrates six core modules: Input Classification, VulnGuard Prompting, Ethical Principles Integration, External Tool Interaction, Output Validation, and User-System Interaction, creating an intelligent expert system for real-time harm prevention. We provide comprehensive mathematical formalization including convergence proofs, vulnerability analysis using information theory, and theoretical validation framework using GitHub-sourced datasets, establishing mathematical foundations for systematic empirical research.

Abstract (translated)

大型语言模型(LLMs)在现实世界应用中的普及,给包括LGBTQ+群体、单亲家庭和边缘化社区在内的脆弱人群带来了前所未有的风险,可能生成有害的、有偏见或误导性的信息。虽然现有的安全方法依赖于事后过滤或通用对齐技术,但它们无法从源头上主动防止有害输出。本文介绍了PromptGuard,这是一种新颖的模块化提示框架,并且我们做出了突破性贡献:VulnGuard Prompt,一种使用现实世界数据驱动对比学习来阻止有害信息生成的混合技术。VulnGuard 通过整合来自精心策划的GitHub存储库的少量样本实例、伦理链式思考推理以及自适应角色提示,为特定人群创建保护屏障。 我们的框架采用了理论上的多目标优化,并且正式证明表明,通过熵边界和帕累托最优性分析实现了25-30%的理论危害减少。PromptGuard 调度了六个核心模块:输入分类、VulnGuard 提示、伦理原则整合、外部工具交互、输出验证以及用户系统交互,从而构建了一个用于实时预防伤害的智能专家系统。 我们提供了全面的数学公式化,包括收敛性证明、使用信息理论进行漏洞分析和基于GitHub数据集的理论验证框架,为系统的实证研究奠定了坚实的数学基础。

URL

https://arxiv.org/abs/2509.08910

PDF

https://arxiv.org/pdf/2509.08910.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot