Paper Reading AI Learner

Partial Large Kernel CNNs for Efficient Super-Resolution

2024-04-18 01:55:44
Dongheon Lee, Seokju Yun, Youngmin Ro

Abstract

Recently, in the super-resolution (SR) domain, transformers have outperformed CNNs with fewer FLOPs and fewer parameters since they can deal with long-range dependency and adaptively adjust weights based on instance. In this paper, we demonstrate that CNNs, although less focused on in the current SR domain, surpass Transformers in direct efficiency measures. By incorporating the advantages of Transformers into CNNs, we aim to achieve both computational efficiency and enhanced performance. However, using a large kernel in the SR domain, which mainly processes large images, incurs a large computational overhead. To overcome this, we propose novel approaches to employing the large kernel, which can reduce latency by 86\% compared to the naive large kernel, and leverage an Element-wise Attention module to imitate instance-dependent weights. As a result, we introduce Partial Large Kernel CNNs for Efficient Super-Resolution (PLKSR), which achieves state-of-the-art performance on four datasets at a scale of $\times$4, with reductions of 68.1\% in latency and 80.2\% in maximum GPU memory occupancy compared to SRFormer-light.

Abstract (translated)

近年来,在超分辨率(SR)领域,Transformer 已经超越了 CNN,因为它们可以处理长距离依赖关系并根据实例自适应调整权重。在本文中,我们证明了 CNN 在当前 SR 领域虽然不如 Transformer 聚焦,但在直接效率测量方面超过了 Transformer。通过将 Transformer 的优势融入 CNN 中,我们旨在实现计算效率和增强性能的平衡。然而,在 SR 领域使用大核,主要用于处理大图像,会带来较大的计算开销。为了克服这一问题,我们提出了使用大核的新方法,与 naive 大核相比,可以降低延迟 86\%,并利用 Element-wise Attention 模块模仿实例相关的权重。因此,我们引入了 efficient Super-Resolution (PLKSR) 部分大核卷积神经网络,在 scale*4 的数据集上实现了最先进的性能,与 SRFormer-light 相比,延迟减少了 68.1\%,最大 GPU 内存占用减少了 80.2\%。

URL

https://arxiv.org/abs/2404.11848

PDF

https://arxiv.org/pdf/2404.11848.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot