Paper Reading AI Learner

TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

2024-04-11 15:39:10
Junyi Li, Zhilu Zhang, Wangmeng Zuo

Abstract

Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID). Existing BSNs are mostly conducted with convolution layers. Although transformers offer potential solutions to the limitations of convolutions and have demonstrated success in various image restoration tasks, their attention mechanisms may violate the blind-spot requirement, thus restricting their applicability in SSID. In this paper, we present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement. Specifically, TBSN follows the architectural principles of dilated BSNs, and incorporates spatial as well as channel self-attention layers to enhance the network capability. For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution. For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures. To eliminate this effect, we divide the channel into several groups and perform channel attention separately. Furthermore, we introduce a knowledge distillation strategy that distills TBSN into smaller denoisers to improve computational efficiency while maintaining performance. Extensive experiments on real-world image denoising datasets show that TBSN largely extends the receptive field and exhibits favorable performance against state-of-the-art SSID methods. The code and pre-trained models will be publicly available at this https URL.

Abstract (translated)

盲点网络(BSN)是自监督图像去噪(SSID)中普遍存在的网络架构。现有的BSN主要使用卷积层。尽管Transformer为卷积的局限性提供了潜在解决方案,并在各种图像修复任务中取得了成功,但它们的注意力机制可能违反盲点要求,从而限制了其在SSID中的应用。在本文中,我们通过分析和解构满足盲点要求的Transformer操作器,提出了一种基于Transformer的盲点网络(TBSN)。具体来说,TBSN遵循扩散BSN的架构原则,并引入了空间和通道自注意层以增强网络能力。对于空间自注意,我们为注意力矩阵应用了一个详细的掩码,以限制其接收场,从而模仿扩散卷积。对于通道自注意,我们观察到,当通道数量大于在多尺度架构的深层中空间大小时,它可能泄漏盲点信息。为了消除这种效果,我们将通道分为几组,并进行通道自注意。此外,我们引入了一种知识蒸馏策略,将TBSN分解为较小的去噪器以提高计算效率,同时保持性能。在现实世界的图像去噪数据集上进行广泛的实验证明,TBSN大大扩展了接收场,并对最先进的SSID方法表现出优越的性能。代码和预训练模型将公开发布在https://这个URL上。

URL

https://arxiv.org/abs/2404.07846

PDF

https://arxiv.org/pdf/2404.07846.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot