Paper Reading AI Learner

Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer

2022-04-30 09:01:21
Wulian Yun, Mengshi Qi, Chuanming Wang, Huiyuan Fu, Huadong Ma

Abstract

Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions. Furthermore, most related works directly take the output after spatio-temporal denoising as the final result, neglecting the fine-grained denoising process. In this paper, we propose a Dual-stage Spatial-Channel Transformer (DSCT) for coarse-to-fine video denoising, which inherits the advantages of both Transformer and CNNs. Specifically, DSCT is proposed based on a progressive dual-stage architecture, namely a coarse-level and a fine-level to extract dynamic feature and static feature, respectively. At both stages, a Spatial-Channel Encoding Module(SCEM) is designed to model the long-range contextual dependencies at spatial and channel levels. Meanwhile, we design a Multi-scale Residual Structure to preserve multiple aspects of information at different stages, which contains a Temporal Features Aggregation Module(TFAM) to summarize the dynamic representation. Extensive experiments on four publicly available datasets demonstrate our proposed DSCT achieves significant improvements compared to the state-of-the-art methods.

Abstract (translated)

URL

https://arxiv.org/abs/2205.00214

PDF

https://arxiv.org/pdf/2205.00214.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot