Paper Reading AI Learner

Towards Real-time Video Compressive Sensing on Mobile Devices

2024-08-14 13:03:31
Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

Abstract

Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256 X 256 X 8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at this https URL.

Abstract (translated)

视频快照压缩成像(SCI)使用一种速度较低的2D相机来捕捉高速场景的快照压缩测量,然后通过重构算法来检索高速视频帧。快速发展的移动设备以及现有的高性能视频SCI复原算法,促使我们为实际应用开发移动复原方法。然而,将以前的重构算法应用于移动设备仍然具有挑战性,尤其是在实时移动复原方面。据我们所知,还没有专门为移动设备设计的视频SCI复原模型。因此,在本文中,我们提出了一个视频SCI复原的有效方法,被称为移动SCI,可以在移动设备上实现实时速度。具体来说,我们首先构建了一个U形2D卷积基架构,这是比以前最先进的复原方法更高效和便携的架构。此外,我们还引入了一个基于通道分割和重排机制的高效特征混合块,作为我们提出的移动SCI的新瓶颈块,减轻计算负担。最后,我们还采用了一种自定义的 Knowledge Distillation 策略来进一步提高复原质量。在模拟数据和真实数据上的广泛结果表明,与移动设备的高效运行相比,我们提出的移动SCI具有卓越的复原质量。特别地,在iPhone 15上,我们可以在实时性能(约35 FPS)下重构256 X 256 X 8的快照压缩测量。代码可在此处下载:https://www.xcode.com/

URL

https://arxiv.org/abs/2408.07530

PDF

https://arxiv.org/pdf/2408.07530.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot