Abstract
In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at this https URL.
Abstract (translated)
在这项工作中,我们解决了现有条件扩散模型的两个局限:由于迭代去噪过程导致其推理速度较慢,以及它们依赖于成对数据进行模型微调。为了应对这些问题,我们引入了一种通过对抗学习目标将单步扩散模型适应新任务和领域的通用方法。具体来说,我们将各种模块整合到一个具有小训练权重的单端到端生成器网络中,提高其保留输入图像结构的能力,同时减少过拟合。我们证明了,对于未配对设置,我们的模型CycleGAN-Turbo在各种场景平移任务中优于现有的基于GAN和扩散的方法,如日夜转换和添加/删除天气效果(如雾、雪和雨)。我们将我们的方法扩展到配对设置,其中我们的模型pix2pix-Turbo与近期的类似工作 Control-Net for Sketch2Photo和Edge2Image相当,但只有一个步骤的推理。这项工作表明,单步扩散模型可以作为各种GAN学习目标的强大骨架。我们的代码和模型可以从该https URL获取。
URL
https://arxiv.org/abs/2403.12036