Paper Reading AI Learner

CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models

2025-01-09 16:43:21
Junha Park, Ian Ryu, Jaehui Hwang, Hyungkeun Park, Jiyoon Kim, Jong-Seok Lee

Abstract

With advances in diffusion models, image generation has shown significant performance improvements. This raises concerns about the potential abuse of image generation, such as the creation of explicit or violent images, commonly referred to as Not Safe For Work (NSFW) content. To address this, the Stable Diffusion model includes several safety checkers to censor initial text prompts and final output images generated from the model. However, recent research has shown that these safety checkers have vulnerabilities against adversarial attacks, allowing them to generate NSFW images. In this paper, we find that these adversarial attacks are not robust to small changes in text prompts or input latents. Based on this, we propose CROPS (Circular or RandOm Prompts for Safety), a model-agnostic framework that easily defends against adversarial attacks generating NSFW images without requiring additional training. Moreover, we develop an approach that utilizes one-step diffusion models for efficient NSFW detection (CROPS-1), further reducing computational resources. We demonstrate the superiority of our method in terms of performance and applicability.

Abstract (translated)

随着扩散模型的进步,图像生成技术在性能上取得了显著提升。然而,这也引发了对可能滥用图像生成的担忧,例如创建色情或暴力内容(通常称为不适合工作场合的内容,即NSFW)。为解决这一问题,Stable Diffusion 模型包含了几种安全检查机制,用于审查初始的文字提示和模型生成的最终输出图像。然而,近期研究显示这些安全检查器容易受到对抗性攻击的影响,使得它们在生成 NSFW 图像时失效。 在这篇论文中,我们发现这些对抗性攻击对于文本提示或输入潜在变量(latents)的小变化并不稳定。基于这一观察,我们提出了一种称为 CROPS(Circular or RandOm Prompts for Safety)的模型无关框架,可以轻松地防御生成 NSFW 图像的对抗性攻击,并且无需进行额外训练即可实现。此外,我们还开发了一种利用一步扩散模型进行高效 NSFW 检测的方法(CROPS-1),进一步减少了计算资源需求。 我们的方法在性能和应用范围方面都表现出了优越性。

URL

https://arxiv.org/abs/2501.05359

PDF

https://arxiv.org/pdf/2501.05359.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot