Paper Reading AI Learner

SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization

2025-06-04 07:22:48
Junpyo Seo (Department of Computer Science, Seoul National University), Hanbin Koo (Department of Computer Science, Seoul National University), Jieun Yook (Department of Computer Science, Seoul National University), Byung-Ro Moon (Department of Computer Science, Seoul National University)

Abstract

We propose a novel diffusion-based framework for automatic colorization of Anime-style facial sketches. Our method preserves the structural fidelity of the input sketch while effectively transferring stylistic attributes from a reference image. Unlike traditional approaches that rely on predefined noise schedules - which often compromise perceptual consistency -- our framework builds on continuous-time diffusion models and introduces SSIMBaD (Sigma Scaling with SSIM-Guided Balanced Diffusion). SSIMBaD applies a sigma-space transformation that aligns perceptual degradation, as measured by structural similarity (SSIM), in a linear manner. This scaling ensures uniform visual difficulty across timesteps, enabling more balanced and faithful reconstructions. Experiments on a large-scale Anime face dataset demonstrate that our method outperforms state-of-the-art models in both pixel accuracy and perceptual quality, while generalizing to diverse styles. Code is available at this http URL

Abstract (translated)

我们提出了一种基于扩散模型的新型框架,用于自动着色Anime风格的脸部草图。我们的方法在保留输入草图结构准确性的前提下,能够有效地从参考图像中转移风格属性。与依赖于预定义噪声时间表的传统方法不同——这往往会损害感知一致性——我们的框架构建在连续时间扩散模型的基础上,并引入了SSIMBaD(带有SSIM引导平衡扩散的Sigma缩放)。SSIMBaD应用了一种sigma空间变换,通过结构相似性(SSIM)测量的方式,在感知退化中实现线性的对齐。这种比例调整确保了在整个时间节点上视觉难度的一致性,从而实现了更加均衡和忠实的重建。 在大规模Anime面部数据集上的实验表明,我们的方法在像素准确性和感知质量方面均超越了当前最佳模型,并且能够泛化到各种风格中。代码可在提供的链接处获取。

URL

https://arxiv.org/abs/2506.04283

PDF

https://arxiv.org/pdf/2506.04283.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot