Paper Reading AI Learner

CDST: Color Disentangled Style Transfer for Universal Style Reference Customization

2025-05-22 18:44:48
Shiwen Zhang, Zhuowei Chen, Lang Chen, Yanze Wu

Abstract

We introduce Color Disentangled Style Transfer (CDST), a novel and efficient two-stream style transfer training paradigm which completely isolates color from style and forces the style stream to be color-blinded. With one same model, CDST unlocks universal style transfer capabilities in a tuning-free manner during inference. Especially, the characteristics-preserved style transfer with style and content references is solved in the tuning-free way for the first time. CDST significantly improves the style similarity by multi-feature image embeddings compression and preserves strong editing capability via our new CDST style definition inspired by Diffusion UNet disentanglement law. By conducting thorough qualitative and quantitative experiments and human evaluations, we demonstrate that CDST achieves state-of-the-art results on various style transfer tasks.

Abstract (translated)

我们介绍了颜色解耦风格迁移(CDST),这是一种新颖且高效的双流风格迁移训练范式,它完全将颜色与风格分离,并迫使风格流对颜色失明。使用同一个模型,在推理过程中以无需微调的方式解锁了通用的风格迁移能力。特别地,通过无需微调的方法首次解决了在保持特征的前提下,利用风格和内容参考进行风格迁移的问题。CDST 通过多特征图像嵌入压缩显著提高了风格相似性,并且通过受扩散 U-Net 解耦规律启发的新 CDST 风格定义,保持了强大的编辑能力。通过对定性和定量实验以及人工评估的全面研究,我们证明了 CDST 在各种风格迁移任务中达到了最先进的成果。

URL

https://arxiv.org/abs/2506.13770

PDF

https://arxiv.org/pdf/2506.13770.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot