Paper Reading AI Learner

DiffStyler: Diffusion-based Localized Image Style Transfer

2024-03-27 11:19:34
Shaoxu Li

Abstract

Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes, whilst concurrently preserving the semantic integrity of the content. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. Recent developments in large-scale text-to-image diffusion models have heralded unprecedented synthesis capabilities, albeit at the expense of relying on extensive and often imprecise textual descriptions to delineate artistic styles. Addressing these limitations, this paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer. DiffStyler lies the utilization of a text-to-image Stable Diffusion model-based LoRA to encapsulate the essence of style targets. This approach, coupled with strategic cross-LoRA feature and attention injection, guides the style transfer process. The foundation of our methodology is rooted in the observation that LoRA maintains the spatial feature consistency of UNet, a discovery that further inspired the development of a mask-wise style transfer technique. This technique employs masks extracted through a pre-trained FastSAM model, utilizing mask prompts to facilitate feature fusion during the denoising process, thereby enabling localized style transfer that preserves the original image's unaffected regions. Moreover, our approach accommodates multiple style targets through the use of corresponding masks. Through extensive experimentation, we demonstrate that DiffStyler surpasses previous methods in achieving a more harmonious balance between content preservation and style integration.

Abstract (translated)

图像风格迁移的目的是将数字图像赋予具有风格目标特征的鲜艳色彩、笔触、形状等,同时保留内容的语义完整性。尽管在任意风格迁移方法上取得了进步,但仍然存在一个普遍的挑战,即内容语义和风格属性之间的微妙的平衡。近年来,大规模文本到图像扩散模型的发展预示着前所未有的合成能力,但代价是依赖广泛的且经常不精确的文本描述来定义艺术风格。为解决这些局限,本文引入了DiffStyler,一种新方法,可实现高效且精确的任意图像风格迁移。DiffStyler利用基于文本到图像的稳定扩散模型(LoRA)来封装风格目标的本质。这种方法与策略的跨LoRA特征和注意注入相结合,引导风格迁移过程。我们方法的基础是观察到LoRA保持UNet的空间特征一致性,这一发现进一步激发了通过掩码级风格迁移技术的发展。这种技术利用预训练的FastSAM模型提取掩码,在去噪过程中利用掩码提示促进特征融合,从而实现局部风格迁移,保留原始图像不受影响区域。此外,通过使用相应的掩码,我们的方法可以适应多种风格目标。通过大量实验,我们证明了DiffStyler在实现内容保护和风格整合的更和谐平衡方面超越了以前的方法。

URL

https://arxiv.org/abs/2403.18461

PDF

https://arxiv.org/pdf/2403.18461.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot