Paper Reading AI Learner

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

2024-04-22 18:18:41
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

Abstract

Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called $\textit{Align Your Steps}$. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different samplers, and observe that our optimized schedules outperform previous hand-crafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.

Abstract (translated)

扩散模型(DMs)在视觉领域及 beyond 已经成为了最先进的生成建模方法。DM 的一个关键缺点是它们的缓慢采样速度,依赖许多大神经网络的连续函数评估。从 DM 采样可以看作是通过称为采样计划的离散化噪声水平解决微分方程。虽然过去的论文主要关注推导高效的求解方法,但很少关注找到最优采样计划,整个文献都依赖人为手动的启发式。在本文中,我们首次提出了一种通用的且原则的优化 DM 采样计划的算法,名为“Align Your Steps”。我们利用随机微积分的方法,找到了针对不同求解器、训练中的 DM 和数据集的最佳采样计划。我们在多个图像、视频以及 2D 玩具数据合成基准上评估了我们新算法的性能,使用各种不同的采样器,观察到我们的优化计划几乎在所有实验中超过了之前的自定义优化计划。我们的方法展示了采样计划优化的未发掘潜力,尤其是在少数步骤的合成领域。

URL

https://arxiv.org/abs/2404.14507

PDF

https://arxiv.org/pdf/2404.14507.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot