Paper Reading AI Learner

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

2023-01-30 14:32:47
Shengmeng Li, Luping Liu, Zenghao Chai, Runnan Li, Xu Tan

Abstract

Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can be formulated as diffusion ordinary differential equations (ODEs), various fast sampling methods can be derived from solving diffusion ODEs. However, we notice that previous sampling methods with fixed analytical form are not robust with the error in the noise estimated from pretrained diffusion models. In this work, we construct an error-robust Adams solver (ERA-Solver), which utilizes the implicit Adams numerical method that consists of a predictor and a corrector. Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower error in the estimated noise. Experiments on Cifar10, LSUN-Church, and LSUN-Bedroom datasets demonstrate that our proposed ERA-Solver achieves 5.14, 9.42, and 9.69 Fenchel Inception Distance (FID) for image generation, with only 10 network evaluations.

Abstract (translated)

虽然去噪扩散概率模型(DDPM)已经取得了显著的生成结果,但DDPM的低采样效率仍然限制了进一步的应用。由于DDPM可以表示为扩散普通微分方程(ODEs),可以从解决扩散ODEs中推导出各种快速采样方法。然而,我们发现,之前采用固定的分析形式采样的方法不能很好地适应从训练扩散模型中估计噪声误差的变化。在本研究中,我们建立了一个错误 robust Adams 求解器(ERA-Solver),它利用包含预测器和纠正器的 implicit Adams 数值方法。与传统基于显式 Adams 方法的预测器不同,我们利用拉格朗日插值函数作为预测器,并使用错误 robust 策略进行增强,以自适应地选择估计噪声中较低的拉格朗日基点。在 Cifar10、LSUN- Church 和 LSUN-Bedroom 数据集上进行了实验,结果表明,我们的ERA-Solver能够在仅进行10次网络评估的情况下,生成5.14、9.42和9.69 Fenchel 感知距离(FID)的图像,从而实现了图像生成任务中显著的效率优势。

URL

https://arxiv.org/abs/2301.12935

PDF

https://arxiv.org/pdf/2301.12935.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot