Paper Reading AI Learner

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

2024-03-15 11:21:34
Feng Li, Yixuan Wu, Zichao Liang, Runmin Cong, Huihui Bai, Yao Zhao, Meng Wang

Abstract

Diffusion models (DM) have achieved remarkable promise in image super-resolution (SR). However, most of them are tailored to solving non-blind inverse problems with fixed known degradation settings, limiting their adaptability to real-world applications that involve complex unknown degradations. In this work, we propose BlindDiff, a DM-based blind SR method to tackle the blind degradation settings in SISR. BlindDiff seamlessly integrates the MAP-based optimization into DMs, which constructs a joint distribution of the low-resolution (LR) observation, high-resolution (HR) data, and degradation kernels for the data and kernel priors, and solves the blind SR problem by unfolding MAP approach along with the reverse process. Unlike most DMs, BlindDiff firstly presents a modulated conditional transformer (MCFormer) that is pre-trained with noise and kernel constraints, further serving as a posterior sampler to provide both priors simultaneously. Then, we plug a simple yet effective kernel-aware gradient term between adjacent sampling iterations that guides the diffusion model to learn degradation consistency knowledge. This also enables to joint refine the degradation model as well as HR images by observing the previous denoised sample. With the MAP-based reverse diffusion process, we show that BlindDiff advocates alternate optimization for blur kernel estimation and HR image restoration in a mutual reinforcing manner. Experiments on both synthetic and real-world datasets show that BlindDiff achieves the state-of-the-art performance with significant model complexity reduction compared to recent DM-based methods. Code will be available at \url{this https URL}

Abstract (translated)

扩散模型(DM)在图像超分辨率(SR)方面取得了显著的进展。然而,大多数DM都是为了解决具有固定已知退化设置的非盲反问题,从而限制了其在现实世界涉及复杂未知退化的应用中的适应性。在这项工作中,我们提出了BlindDiff,一种基于DM的盲SR方法,用于解决SISR中的盲退化设置。BlindDiff无缝地将基于MAP的优化集成到DM中,构建了数据和核先验的联合分布,并利用展开的MAP方法与反向过程一起解决盲SR问题。与大多数DM不同,BlindDiff首先引入了一个预训练带有噪声和核约束的调制条件Transformer(MCFormer),作为后验采样器,同时提供核先验。然后,我们在相邻采样迭代之间插入一个简单的但有效的核关注度梯度项,引导扩散模型学习退化一致性知识。这还允许通过观察前 denoised 样本来联合优化降解模型和HR图像。与基于MAP的反向扩散过程相结合,我们证明了BlindDiff在模糊核估计和HR图像恢复方面主张交替优化。在合成和现实世界数据集上的实验表明,与最近的DM基于方法相比,BlindDiff在模型复杂度降低的情况下实现了最先进的性能。代码将在\url{这个https URL}上可用。

URL

https://arxiv.org/abs/2403.10211

PDF

https://arxiv.org/pdf/2403.10211.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot