Abstract
Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.
Abstract (translated)
基于分数的扩散模型通过学习反转一个将目标分布的噪声扰动数据的过程来生成新数据,已经在各种生成任务中取得了显著的成功。尽管它们的经验性能优越,但现有的理论保证通常受到严格的假设或次优收敛率的限制。在本文中,我们建立了一个对一种流行的SDE基于采样器的最小假设的快速收敛理论。我们的分析表明,只要具有$\ell_2$精度的分数函数估计,目标生成分布和生成分布之间的总方差距离的上界为$O(d/T)$(忽略对数因素),其中$d$是数据维度,$T$是步数。对于任何具有有限第一项方差的target分布,这个结果都成立。据我们所知,这比现有的基于SDE-based采样器和另一个ODE-based采样器的收敛理论有所改进,同时对目标数据分布和分数估计施加了最低要求。是通过一系列新颖的数学工具,对反向过程中误差传播的每个步骤进行了详细刻画,从而实现了这一目标。
URL
https://arxiv.org/abs/2409.18959