Paper Reading AI Learner

HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution

2024-05-08 12:14:34
Shu-Chuan Chu, Zhi-Chao Dou, Jeng-Shyang Pan, Shaowei Weng, Junbao Li

Abstract

Transformer-based methods have demonstrated excellent performance on super-resolution visual tasks, surpassing conventional convolutional neural networks. However, existing work typically restricts self-attention computation to non-overlapping windows to save computational costs. This means that Transformer-based networks can only use input information from a limited spatial range. Therefore, a novel Hybrid Multi-Axis Aggregation network (HMA) is proposed in this paper to exploit feature potential information better. HMA is constructed by stacking Residual Hybrid Transformer Blocks(RHTB) and Grid Attention Blocks(GAB). On the one side, RHTB combines channel attention and self-attention to enhance non-local feature fusion and produce more attractive visual results. Conversely, GAB is used in cross-domain information interaction to jointly model similar features and obtain a larger perceptual field. For the super-resolution task in the training phase, a novel pre-training method is designed to enhance the model representation capabilities further and validate the proposed model's effectiveness through many experiments. The experimental results show that HMA outperforms the state-of-the-art methods on the benchmark dataset. We provide code and models at this https URL.

Abstract (translated)

基于Transformer的方法在超分辨率视觉任务上已经表现出卓越的性能,超过了传统的卷积神经网络。然而,现有的工作通常将自注意力计算限制在非重叠的窗口以节省计算成本。这意味着基于Transformer的网络只能利用输入信息的有限空间范围。因此,本文提出了一种新颖的混合多轴聚合网络(HMA)来更好地利用特征潜力信息。HMA由Residual Hybrid Transformer Blocks(RHTB)和Grid Attention Blocks(GAB)堆叠而成。一方面,RHTB通过结合通道关注和自注意力来增强非局部特征融合,产生更具吸引力的视觉结果。另一方面,GAB用于跨域信息交互,共同建模类似特征,获得更大的感知场。在训练阶段,为增强模型表示能力并验证所提出的模型的有效性,设计了一种新颖的预训练方法。实验结果表明,HMA在基准数据集上优于最先进的方法。我们在这个网址提供了代码和模型。

URL

https://arxiv.org/abs/2405.05001

PDF

https://arxiv.org/pdf/2405.05001.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot