Paper Reading AI Learner

$O$ Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

2024-09-27 17:59:10
Gen Li, Yuling Yan

Abstract

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

Abstract (translated)

基于分数的扩散模型通过学习反转一个将目标分布的噪声扰动数据的过程来生成新数据,已经在各种生成任务中取得了显著的成功。尽管它们的经验性能优越,但现有的理论保证通常受到严格的假设或次优收敛率的限制。在本文中,我们建立了一个对一种流行的SDE基于采样器的最小假设的快速收敛理论。我们的分析表明,只要具有$\ell_2$精度的分数函数估计,目标生成分布和生成分布之间的总方差距离的上界为$O(d/T)$(忽略对数因素),其中$d$是数据维度,$T$是步数。对于任何具有有限第一项方差的target分布,这个结果都成立。据我们所知,这比现有的基于SDE-based采样器和另一个ODE-based采样器的收敛理论有所改进,同时对目标数据分布和分数估计施加了最低要求。是通过一系列新颖的数学工具,对反向过程中误差传播的每个步骤进行了详细刻画,从而实现了这一目标。

URL

https://arxiv.org/abs/2409.18959

PDF

https://arxiv.org/pdf/2409.18959.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot