Paper Reading AI Learner

Perceptual Coding for Compressed Video Understanding: A New Framework and Benchmark

2022-02-06 16:29:15
Yuan Tian, Guo Lu, Yichao Yan, Guangtao Zhai, Li Chen, Zhiyong Gao

Abstract

Most video understanding methods are learned on high-quality videos. However, in most real-world scenarios, the videos are first compressed before the transportation and then decompressed for understanding. The decompressed videos are degraded in terms of perceptual quality, which may degenerate the downstream tasks. To address this issue, we propose the first coding framework for compressed video understanding, where another learnable perceptual bitstream is introduced and simultaneously transported with the video bitstream. With the sophisticatedly designed optimization target and network architectures, this new stream largely boosts the perceptual quality of the decoded videos yet with a small bit cost. Our framework can enjoy the best of both two worlds, (1) highly efficient content-coding of industrial video codec and (2) flexible perceptual-coding of neural networks (NNs). Finally, we build a rigorous benchmark for compressed video understanding over four different compression levels, six large-scale datasets, and two popular tasks. The proposed Dual-bitstream Perceptual Video Coding framework Dual-PVC consistently demonstrates significantly stronger performances than the baseline codec under the same bitrate level.

Abstract (translated)

URL

https://arxiv.org/abs/2202.02813

PDF

https://arxiv.org/pdf/2202.02813.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot