Paper Reading AI Learner

Efficient Diffusion Model for Image Restoration by Residual Shifting

2024-03-12 05:06:07
Zongsheng Yue, Jianyi Wang, Chen Change Loy

Abstract

While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seeking to expedite the process, inevitably sacrifice performance to some extent, resulting in over-blurry restored outcomes. To address this issue, this study proposes a novel and efficient diffusion model for IR that significantly reduces the required number of diffusion steps. Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration. Specifically, our proposed method establishes a Markov chain that facilitates the transitions between the high-quality and low-quality images by shifting their residuals, substantially improving the transition efficiency. A carefully formulated noise schedule is devised to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experimental evaluations demonstrate that the proposed method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks, namely image super-resolution, image inpainting, and blind face restoration, \textit{\textbf{even only with four sampling steps}}. Our code and model are publicly available at \url{this https URL}.

Abstract (translated)

虽然基于扩散的图像恢复(IR)方法已经取得了显著的成功,但它们仍然受到执行数百甚至数千个采样步骤的低推理速度的限制。现有的加速采样技术,尽管试图加速过程,但不可避免地牺牲了性能,导致恢复结果过模糊。为了解决这个问题,本研究提出了一个新颖且高效的扩散模型用于IR,显著减少了所需的扩散步骤。我们的方法在推理过程中不需要后加速,从而避免了与性能下降相关的后加速。具体来说,我们提出了一种通过移动残差来促进高质量和低质量图像之间转换的马尔可夫链。它大大提高了转换效率。为了灵活控制扩散过程的平滑度和噪声强度,我们精心设计了一个噪声时间表。大量的实验评估结果表明,与当前最先进的方法相比,所提出的方法在三个经典的IR任务(即图像超分辨率、图像修复和盲人面修复)上实现了卓越或可比较的性能,即使在仅使用四个采样步骤的情况下也是如此。我们的代码和模型公开可用,网址为:https:// this URL。

URL

https://arxiv.org/abs/2403.07319

PDF

https://arxiv.org/pdf/2403.07319.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot