Abstract
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called $\textit{Align Your Steps}$. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different samplers, and observe that our optimized schedules outperform previous hand-crafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.
Abstract (translated)
扩散模型(DMs)在视觉领域及 beyond 已经成为了最先进的生成建模方法。DM 的一个关键缺点是它们的缓慢采样速度,依赖许多大神经网络的连续函数评估。从 DM 采样可以看作是通过称为采样计划的离散化噪声水平解决微分方程。虽然过去的论文主要关注推导高效的求解方法,但很少关注找到最优采样计划,整个文献都依赖人为手动的启发式。在本文中,我们首次提出了一种通用的且原则的优化 DM 采样计划的算法,名为“Align Your Steps”。我们利用随机微积分的方法,找到了针对不同求解器、训练中的 DM 和数据集的最佳采样计划。我们在多个图像、视频以及 2D 玩具数据合成基准上评估了我们新算法的性能,使用各种不同的采样器,观察到我们的优化计划几乎在所有实验中超过了之前的自定义优化计划。我们的方法展示了采样计划优化的未发掘潜力,尤其是在少数步骤的合成领域。
URL
https://arxiv.org/abs/2404.14507