Paper Reading AI Learner

Multi-Scale Implicit Transformer with Re-parameterize for Arbitrary-Scale Super-Resolution

2024-03-11 09:23:20
Jinchen Zhu, Mingjian Zhang, Ling Zheng, Shizhuang Weng

Abstract

Recently, the methods based on implicit neural representations have shown excellent capabilities for arbitrary-scale super-resolution (ASSR). Although these methods represent the features of an image by generating latent codes, these latent codes are difficult to adapt for different magnification factors of super-resolution, which seriously affects their performance. Addressing this, we design Multi-Scale Implicit Transformer (MSIT), consisting of an Multi-scale Neural Operator (MSNO) and Multi-Scale Self-Attention (MSSA). Among them, MSNO obtains multi-scale latent codes through feature enhancement, multi-scale characteristics extraction, and multi-scale characteristics merging. MSSA further enhances the multi-scale characteristics of latent codes, resulting in better performance. Furthermore, to improve the performance of network, we propose the Re-Interaction Module (RIM) combined with the cumulative training strategy to improve the diversity of learned information for the network. We have systematically introduced multi-scale characteristics for the first time in ASSR, extensive experiments are performed to validate the effectiveness of MSIT, and our method achieves state-of-the-art performance in arbitrary super-resolution tasks.

Abstract (translated)

近年来,基于隐式神经表示的方法在任意规模超分辨率(ASSR)中表现出了卓越的性能。尽管这些方法通过生成隐含码来表示图像的特征,但这些隐含码对于不同放大的超分辨率因素来说很难进行适应,这严重地影响了其性能。为了解决这个问题,我们设计了一个多尺度隐式Transformer(MSIT),由多尺度神经操作(MSNO)和多尺度自注意(MSSA)组成。其中,MSNO通过特征增强、多尺度特征提取和多尺度特征合并来获得多尺度隐含码。MSSA进一步增强了多尺度特征,从而提高了性能。此外,为了提高网络的性能,我们采用累积训练策略与重新交互模块(RIM)相结合,以提高网络学习信息多样性。我们首次系统地引入了多尺度特征到ASSR中,并通过大量实验验证了MSIT的有效性。在任意超分辨率任务中,我们的方法实现了与最先进水平相当的表现。

URL

https://arxiv.org/abs/2403.06536

PDF

https://arxiv.org/pdf/2403.06536.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot