Paper Reading AI Learner

Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

2024-03-17 04:08:58
Jialu Sui, Xianping Ma, Xiaokang Zhang, Man-On Pun

Abstract

Remote sensing image super-resolution (SR) is a crucial task to restore high-resolution (HR) images from low-resolution (LR) observations. Recently, the Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions by overcoming problems inherent in generative models, such as over-smoothing and mode collapse. However, the high-frequency details generated by DDPM often suffer from misalignment with HR images due to the model's tendency to overlook long-range semantic contexts. This is attributed to the widely used U-Net decoder in the conditional noise predictor, which tends to overemphasize local information, leading to the generation of noises with significant variances during the prediction process. To address these issues, an adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM by incorporating low-frequency semantic information provided by the Transformer. Specifically, a novel adaptive diffusion Transformer decoder (ADTD) is developed to bridge the semantic gap between the encoder and decoder through regulating the noise prediction with the global contextual relationships and long-range dependencies in the diffusion process. Additionally, a residual feature fusion strategy establishes information exchange between the two decoders at multiple levels. As a result, the predicted noise generated by our approach closely approximates that of the real noise distribution.Extensive experiments on two SR and two semantic segmentation datasets confirm the superior performance of the proposed ASDDPM in both SR and the subsequent downstream applications. The source code will be available at this https URL.

Abstract (translated)

遥感图像超分辨率(SR)是将低分辨率(LR)观察结果恢复为高分辨率(HR)图像的关键任务。最近,由Denoising Diffusion Probabilistic Model(DDPM)产生的图像重构已经通过克服生成模型的固有问题的表现表明了具有前景。然而,DDPM产生的高频细节往往由于模型倾向于忽视长距离语义上下文而与HR图像错位。这归因于在条件噪声预测中广泛使用的U-Net解码器,它倾向于强调局部信息,导致预测过程中生成具有显著方差的大噪声。为了应对这些问题,我们提出了一个自适应语义增强的DDPM(ASDDPM),通过整合Transformer提供的低频语义信息来增强DDPM的细节保留能力。具体来说,我们开发了一种新的自适应扩散Transformer解码器(ADTD)来通过全局上下文关系和扩散过程的噪声预测来调节信息在编码器和解码器之间的交换。此外,残差特征融合策略建立了在多个级别上两个解码器之间的信息交流。通过这种方式,我们方法产生的预测噪声与真实噪声分布非常接近。在两个SR和两个语义分割数据集上的实验证实了所提出的ASDDPM在SR和后续应用中的优越性能。源代码将在此处链接。

URL

https://arxiv.org/abs/2403.11078

PDF

https://arxiv.org/pdf/2403.11078.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot