Paper Reading AI Learner

NVTC: Nonlinear Vector Transform Coding

2023-05-25 13:06:38
Runsen Feng, Zongyu Guo, Weiping Li, Zhibo Chen

Abstract

In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). NVTC solves the critical complexity issue of VQ through (1) a multi-stage quantization strategy and (2) nonlinear vector transforms. In addition, we apply entropy-constrained VQ in latent space to adaptively determine the quantization boundaries for joint rate-distortion optimization, which improves the performance both theoretically and experimentally. Compared to previous NTC approaches, NVTC demonstrates superior rate-distortion performance, faster decoding speed, and smaller model size. Our code is available at this https URL

Abstract (translated)

从理论上讲,向量量化(VQ)在速率扭曲(R-D)性能方面总是比单向量化(SQ)更好。最近的神经网络图像压缩先进技术主要基于非线性变换编码(NTC)和均匀向量量化,忽视了VQ的好处,因为其复杂性呈指数级增加。在本文中,我们首先研究了一些玩具源,证明了现代神经网络虽然能够显著增强非线性变换下的SQ压缩性能,但仍然存在SQ和VQ之间的不可逾越的差距。因此,围绕VQ提出了一种名为非线性向量变换编码(NVTC)的神经网络图像压缩新框架。NVTC通过(1)多级量化策略和(2)非线性向量变换解决了VQ的关键复杂性问题。此外,我们在潜在空间应用熵限制的VQ,自适应地确定 joint 速率扭曲优化的量化边界,从而提高了理论实验性能。与以前的NTC方法相比,NVTC表现出更好的速率扭曲性能,更快的解码速度,较小的模型大小。我们的代码可在this https URL上获取。

URL

https://arxiv.org/abs/2305.16025

PDF

https://arxiv.org/pdf/2305.16025.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot