Paper Reading AI Learner

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

2024-05-08 11:09:24
Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

Abstract

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.

Abstract (translated)

近年来,遥感图像(RSI)超分辨率(SR)方面的进步已经显著地使用了深度神经网络,例如卷积神经网络(CNN)和Transformer。然而,现有的SR方法通常存在接收范围有限或线性计算开销等问题,导致全局表示效果不佳,并在大规模RSI上产生不可接受的计算成本。为了减轻这些问题,我们开发了第一个将Vision State Space Model(Mamba)集成到RSI-SR中的尝试,Mamba专门处理大规模RSI并通过线性复杂性捕捉长距离依赖。为了实现更好的SR复原,我们在Mamba的基础上设计了一个Frequency-assisted Mamba框架,称之为FMSR,以探讨其空间和频率关联。特别地,我们的FMSR配备了多级融合架构,包括频率选择模块(FSM)、视觉状态空间模块(VSSM)和混合门模块(HGM),以把握其对有效空间-频率融合的优点。认识到全局和局部依赖是互补的,两者都对SR有益,我们通过可学习缩放调整器进一步重新校准这些多级特征以实现准确的特征融合。在AID、DOTA和DIOR基准测试上进行的广泛实验证明,我们的FMSR在PSNR方面平均优于基于Transformer的当前最先进方法HAT-L,同时消耗只有28.05%和19.08%的内存开销和复杂度。

URL

https://arxiv.org/abs/2405.04964

PDF

https://arxiv.org/pdf/2405.04964.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot