Abstract
We propose a novel diffusion-based framework for automatic colorization of Anime-style facial sketches. Our method preserves the structural fidelity of the input sketch while effectively transferring stylistic attributes from a reference image. Unlike traditional approaches that rely on predefined noise schedules - which often compromise perceptual consistency -- our framework builds on continuous-time diffusion models and introduces SSIMBaD (Sigma Scaling with SSIM-Guided Balanced Diffusion). SSIMBaD applies a sigma-space transformation that aligns perceptual degradation, as measured by structural similarity (SSIM), in a linear manner. This scaling ensures uniform visual difficulty across timesteps, enabling more balanced and faithful reconstructions. Experiments on a large-scale Anime face dataset demonstrate that our method outperforms state-of-the-art models in both pixel accuracy and perceptual quality, while generalizing to diverse styles. Code is available at this http URL
Abstract (translated)
我们提出了一种基于扩散模型的新型框架,用于自动着色Anime风格的脸部草图。我们的方法在保留输入草图结构准确性的前提下,能够有效地从参考图像中转移风格属性。与依赖于预定义噪声时间表的传统方法不同——这往往会损害感知一致性——我们的框架构建在连续时间扩散模型的基础上,并引入了SSIMBaD(带有SSIM引导平衡扩散的Sigma缩放)。SSIMBaD应用了一种sigma空间变换,通过结构相似性(SSIM)测量的方式,在感知退化中实现线性的对齐。这种比例调整确保了在整个时间节点上视觉难度的一致性,从而实现了更加均衡和忠实的重建。 在大规模Anime面部数据集上的实验表明,我们的方法在像素准确性和感知质量方面均超越了当前最佳模型,并且能够泛化到各种风格中。代码可在提供的链接处获取。
URL
https://arxiv.org/abs/2506.04283