Paper Reading AI Learner

Streamlining Image Editing with Layered Diffusion Brushes

2024-05-01 04:30:03
Peyman Gholami, Robert Xiao

Abstract

Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers; regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.

Abstract (translated)

去噪扩散模型最近因成为各种图像生成和编辑任务的强大工具而受到了广泛关注。在此基础上,我们提出了一种名为分层扩散刷的新工具,为用户提供了在现有提示基础上的细粒度区域目标指导,以及对输入图像的完整性和上下文的保留。我们提出的编辑技术被称为分层扩散刷,利用了提示引导的中间去噪步骤的区域的修改,可以在保持输入图像的完整性和上下文的同时进行精确修改。我们还基于分层扩散刷的编辑器,该编辑器包含了著名的图像编辑概念,如层遮罩、可见性开关和层级的独立操作;无论它们的顺序如何。我们的系统在高端消费级GPU上对512x512图像进行一次编辑,用时140毫秒,实现了实时的反馈和快速的选择编辑。我们对我们的方法进行了用户研究,包括自然图像(使用反向映射)和生成图像,证明了它的可用性和效果与现有技术(如InstructPix2Pix和稳定扩散修复)相比优越。我们的方法在各种任务中表现出有效性,包括对象属性的调整、错误纠正和基于序列提示的对象放置和编辑,证明了它的多才多艺和提高创意工作流程的潜力。

URL

https://arxiv.org/abs/2405.00313

PDF

https://arxiv.org/pdf/2405.00313.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot