Paper Reading AI Learner

DiffIR: Efficient Diffusion Model for Image Restoration

2023-03-16 16:47:14
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc Van Gool

Abstract

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis generating each pixel from scratch, most pixels of image restoration (IR) are given. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs.

Abstract (translated)

扩散模型(DM)通过将图像合成过程模拟为一种顺序应用的去噪网络,实现了SOTA性能。然而,与图像合成从 scratch 开始生成每个像素不同,图像恢复(IR)的大部分像素是给定的。因此,对于IR,传统的DM在大型模型上进行大量迭代来估计整张图像或特征映射是不高效的。为了解决这一问题,我们提出了一种高效的IRDM(DiffIR),它由一个紧凑的IR前缀提取网络(CPEN)、动态IR转换器(DIRformer)和一个去噪网络组成。具体来说,DiffIR有两个训练阶段:预训练和训练DM。在预训练阶段,我们将 ground-truth 图像输入到CPEN_{S1} 中,以捕捉紧凑的IR前缀表示(IPR)来指导DIRformer。在第二个阶段,我们训练DM直接估计与预训练的CPEN_{S1} 相同的IPR,仅使用LQ图像。我们发现,由于IPR只是紧凑向量,DiffIR可以使用比传统DM更少的迭代来获得准确的估计和生成更稳定和真实的结果。由于迭代数量较少,我们的DiffIR可以采用CPEN_{S2}、DIRformer和去噪网络的联合优化,进一步减少估计误差的影响。我们对多个IR任务进行了广泛的实验,并在消耗较少计算成本的同时实现了SOTA性能。

URL

https://arxiv.org/abs/2303.09472

PDF

https://arxiv.org/pdf/2303.09472.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot