Paper Reading AI Learner

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

2024-05-05 17:34:38
Xiaoyan Lei, Wenlong ZHang, Weifeng Cao

Abstract

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at this https URL

Abstract (translated)

高效的图像超分辨率(SR)旨在通过最小化计算复杂度和网络参数来加速SR网络推理,同时保持性能。现有的最先进的Efficient Image Super-Resolution方法基于卷积神经网络。在Mamba上,已经尝试了一些利用其远距离建模能力和高性价比的方法,这些方法在高级视觉任务上的表现令人印象深刻。在本文中,我们提出了DVMSR,一种新颖的轻量级图像SR网络,它结合了Vision Mamba和差分策略。DVMSR网络由三个模块组成:特征提取卷积、多层堆叠残差状态空间块(RSSB)和重构模块。具体来说,深层特征提取模块由多个残差状态空间块(RSSB)组成,每个RSSB都包含多个Vision Mamba模块和一个残差连接。为了在保持性能的同时实现效率提升,我们对视觉Mamba网络采用了差分策略,以获得更好的性能。具体来说,我们利用教师网络的丰富表示知识作为对轻量学生网络输出的附加监督。大量实验证明,与最先进的有效SR方法相比,我们提出的DVMSR在模型参数方面具有优越的性能,同时保持PSNR和SSIM的性能。源代码可在此处访问:https://url

URL

https://arxiv.org/abs/2405.03008

PDF

https://arxiv.org/pdf/2405.03008.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot