Paper Reading AI Learner

Parallel Sampling of Diffusion Models

2023-05-25 17:59:42
Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari

Abstract

Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.

Abstract (translated)

扩散模型是强大的生成模型,但采样速度较慢,常常需要1000个Sequential的denoising步骤才能完成一个样本。因此,已经有大量的努力被用于减少denoising步骤的数量,但这些方法却损害了样本质量。我们本 paper 探索了一种与之相反的方法:并行运行denoising步骤(以计算换取速度)。尽管denoising步骤的顺序性,但我们表明,实际上可以通过 Picard 迭代法并行化采样,通过猜测未来denoising步骤的解决方案并迭代优化,直到收敛。利用这一洞察力,我们提出了 ParaDiGMS,一种 novel 方法,以加速训练好的扩散模型的采样,通过并行denoising多个步骤。 ParaDiGMS 是第一种能够以计算换取速度的扩散采样方法,甚至与现有的快速采样技术如 DDIM 和 DPMSolver 兼容。使用 ParaDiGMS,我们在各种机器人和图像生成模型中提高了采样速度,使得最先进的采样速度为 0.2s 的 100 步扩散策略和 16s 的 1000 步稳定扩散-v2,且任务奖励、FID 得分或Clip 得分没有可测量的下降。

URL

https://arxiv.org/abs/2305.16317

PDF

https://arxiv.org/pdf/2305.16317.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot