Paper Reading AI Learner

VQ-NeRV: A Vector Quantized Neural Representation for Videos

2024-03-19 03:19:07
Yunjie Xu, Xiang Feng, Feiwei Qin, Ruiquan Ge, Yong Peng, Changmiao Wang

Abstract

Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencies, Hybrid Neural Representation for Videos (HNeRV) was introduced with content-adaptive embeddings. Nevertheless, HNeRV's compression ratios remain relatively low, attributable to an oversight in leveraging the network's shallow features and inter-frame residual information. In this work, we introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block. This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively. This approach proves particularly advantageous in video compression, as it results in smaller size compared to quantized features. Furthermore, we introduce an original codebook optimization technique, termed shallow codebook optimization, designed to refine the utility and efficiency of the codebook. The experimental evaluations indicate that VQ-NeRV outperforms HNeRV on video regression tasks, delivering superior reconstruction quality (with an increase of 1-2 dB in Peak Signal-to-Noise Ratio (PSNR)), better bit per pixel (bpp) efficiency, and improved video inpainting outcomes.

Abstract (translated)

隐式神经表示(INR)在编码视频方面表现出色,展示了在视频压缩和去噪等计算机视觉任务中的潜力。基于INR的方法从内容无关的嵌入中重构视频帧,这会削弱他们在视频帧回归和视频插值方面的效果,并限制其通用能力。为解决这些不足,我们引入了Hybrid Neural Representation for Videos(HNeRV),它使用内容自适应嵌入。然而,HNeRV的压缩比仍然相对较低,这是由于在利用网络的浅层特征和跨帧残差信息方面存在疏漏。在这项工作中,我们引入了一种先进的U型架构,称为Vector Quantized-NeRV(VQ-NeRV),它包含一个新颖的组件——VQ-NeRV块。这个块采用了一种有效的编码方案来离散化网络的浅层残差特征和跨帧残差信息。这种方法在视频压缩方面尤其优越,因为结果是相比量化特征更小的尺寸。此外,我们还引入了一种原始代码本优化技术,称为浅层代码本优化,旨在优化代码本的效用和效率。实验评估结果表明,VQ-NeRV在视频回归任务中优于HNeRV,实现了卓越的重建质量(在峰值信号-噪声比(PSNR)上增加1-2 dB),更好的每像素(bpp)效率和改善的视频修复效果。

URL

https://arxiv.org/abs/2403.12401

PDF

https://arxiv.org/pdf/2403.12401.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot