Paper Reading AI Learner

CFAT: Unleashing TriangularWindows for Image Super-resolution

2024-03-24 13:31:31
Abhisek Ray, Gaurav Kumar, Maheshkumar H. Kolekar

Abstract

Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.

Abstract (translated)

基于Transformer的模型通过利用其固有的捕获复杂上下文特征的能力,对图像超分辨率(SR)领域进行了革命性的变革。目前在超分辨率模型中使用的重叠矩形平移窗口技术是一种常见的提高图像放大效果和鲁棒性的方法。然而,它会在边界处产生扭曲,并且具有有限的独特平移模式。为了克服这些缺陷,我们提出了一个非重叠三角窗技术,该技术与矩形窗同步工作以减轻边界级扭曲,并允许模型访问更多的独特平移模式。在本文中,我们提出了一个包含基于通道的全局注意力和三角-矩形窗口基于局部注意力的图像超分辨率复合融合注意力Transformer(CFAT)。结果表明,CFAT能够使注意机制激活更多的图像像素,并捕获长距离、多尺度特征,从而提高SR性能。 extensive实验结果和消融分析证明了CFAT在SR领域的效果。与最先进的SR架构相比,我们提出的模型在性能上提高了0.7 dB。

URL

https://arxiv.org/abs/2403.16143

PDF

https://arxiv.org/pdf/2403.16143.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot