Abstract
Denoising Diffusion Probabilistic Models (DDPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could lead to the problem of exposure bias due to the accumulation of prediction errors over iterations. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DDPM. In this work, we conduct a systematic study of exposure bias in diffusion models and, intriguingly, we find that the exposure bias could be alleviated with a new sampling method, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce an inference method named Time-Shift Sampler. Our framework can be seamlessly integrated with existing sampling algorithms, such as DDIM or DDPM, inducing merely minimal additional computations. Experimental results show that our proposed framework can effectively enhance the quality of images generated by existing sampling algorithms.
Abstract (translated)
去噪扩散概率模型(DDPM)在合成高质量图像方面表现出了卓越的效果。然而,它们的推理过程的特点是需要进行大量、 potentially 数百次迭代步骤,这可能导致由于迭代中预测误差的累积而产生的曝光偏差问题。以前的工作曾试图通过在训练时扰动输入来缓解这个问题,因此要求 DDPM 进行重新训练。在本文中,我们进行了一项系统研究扩散模型的曝光偏差问题,令人感兴趣的是,我们发现可以通过一种新的采样方法来解决曝光偏差问题,而不需要重新训练模型。我们经验证和理论地表明,在推理时,对于每个backward time step $t$ 和相应的状态 $\hat{x}_t$,可能存在另一个时间 step $t_s$ 表现出与 $\hat{x}_t$ 更强的耦合。基于这个发现,我们引入了名为时间Shift Sampler的推理方法。我们的框架可以无缝地与现有的采样算法,如 DDIM 或 DDPM,产生仅仅少量的额外计算,从而实现了模型的无缝集成。实验结果显示,我们提出的框架可以 effectively enhance 由现有采样算法生成的图像的质量。
URL
https://arxiv.org/abs/2305.15583