Paper Reading AI Learner

Sampling-Priors-Augmented Deep Unfolding Network for Robust Video Compressive Sensing

2023-07-14 12:05:14
Yuhao Huang, Gangrong Qu, Youran Ge

Abstract

Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those networks need to be retrained to accommodate the new system. Such limitations hinder the real-time imaging and practical deployment of models. In this work, we propose a Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) for efficient and robust VCS reconstruction. Under the optimization-inspired deep unfolding framework, a lightweight and efficient U-net is exploited to downsize the model while improving overall performance. Moreover, the prior knowledge from the sampling model is utilized to dynamically modulate the network features to enable single SPA-DUN to handle arbitrary sampling settings, augmenting interpretability and generality. Extensive experiments on both simulation and real datasets demonstrate that SPA-DUN is not only applicable for various sampling settings with one single model but also achieves SOTA performance with incredible efficiency.

Abstract (translated)

视频压缩感知(VCS)的目标是从单个捕获测量中重构多个帧,从而实现低帧率传感器的高速度场景录制。尽管最近在VCS方面取得了令人印象深刻的进展,但这些先进的方法也显著增加了模型的复杂性并出现了 poor generality和Robustness 的问题,这意味着这些网络需要适应新的系统并进行训练。这些限制妨碍了实时成像和模型的实际部署。在本文中,我们提出了一种采样先验增强深度展开网络(SPA-DUN)来高效和稳健地 VCS 重构。在基于优化的深度展开框架下,利用轻量级且高效的 U-net 减小模型大小并提高整体性能。此外,从采样模型的先验知识动态地调节网络特征,使单个 SPA-DUN 能够处理任意采样设置,增加可解释性和一般性。在模拟和真实数据集上的广泛实验表明,SPA-DUN不仅可以适用于单个模型的各种采样设置,而且具有惊人的效率和 SOTA 性能。

URL

https://arxiv.org/abs/2307.07291

PDF

https://arxiv.org/pdf/2307.07291.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot