Paper Reading AI Learner

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

2023-05-24 21:39:27
Mingxiao Li, Tingyu Qu, Wei Sun, Marie-Francine Moens

Abstract

Denoising Diffusion Probabilistic Models (DDPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could lead to the problem of exposure bias due to the accumulation of prediction errors over iterations. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DDPM. In this work, we conduct a systematic study of exposure bias in diffusion models and, intriguingly, we find that the exposure bias could be alleviated with a new sampling method, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce an inference method named Time-Shift Sampler. Our framework can be seamlessly integrated with existing sampling algorithms, such as DDIM or DDPM, inducing merely minimal additional computations. Experimental results show that our proposed framework can effectively enhance the quality of images generated by existing sampling algorithms.

Abstract (translated)

去噪扩散概率模型(DDPM)在合成高质量图像方面表现出了卓越的效果。然而,它们的推理过程的特点是需要进行大量、 potentially 数百次迭代步骤,这可能导致由于迭代中预测误差的累积而产生的曝光偏差问题。以前的工作曾试图通过在训练时扰动输入来缓解这个问题,因此要求 DDPM 进行重新训练。在本文中,我们进行了一项系统研究扩散模型的曝光偏差问题,令人感兴趣的是,我们发现可以通过一种新的采样方法来解决曝光偏差问题,而不需要重新训练模型。我们经验证和理论地表明,在推理时,对于每个backward time step $t$ 和相应的状态 $\hat{x}_t$,可能存在另一个时间 step $t_s$ 表现出与 $\hat{x}_t$ 更强的耦合。基于这个发现,我们引入了名为时间Shift Sampler的推理方法。我们的框架可以无缝地与现有的采样算法,如 DDIM 或 DDPM,产生仅仅少量的额外计算,从而实现了模型的无缝集成。实验结果显示,我们提出的框架可以 effectively enhance 由现有采样算法生成的图像的质量。

URL

https://arxiv.org/abs/2305.15583

PDF

https://arxiv.org/pdf/2305.15583.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot