Abstract
In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). NVTC solves the critical complexity issue of VQ through (1) a multi-stage quantization strategy and (2) nonlinear vector transforms. In addition, we apply entropy-constrained VQ in latent space to adaptively determine the quantization boundaries for joint rate-distortion optimization, which improves the performance both theoretically and experimentally. Compared to previous NTC approaches, NVTC demonstrates superior rate-distortion performance, faster decoding speed, and smaller model size. Our code is available at this https URL
Abstract (translated)
从理论上讲,向量量化(VQ)在速率扭曲(R-D)性能方面总是比单向量化(SQ)更好。最近的神经网络图像压缩先进技术主要基于非线性变换编码(NTC)和均匀向量量化,忽视了VQ的好处,因为其复杂性呈指数级增加。在本文中,我们首先研究了一些玩具源,证明了现代神经网络虽然能够显著增强非线性变换下的SQ压缩性能,但仍然存在SQ和VQ之间的不可逾越的差距。因此,围绕VQ提出了一种名为非线性向量变换编码(NVTC)的神经网络图像压缩新框架。NVTC通过(1)多级量化策略和(2)非线性向量变换解决了VQ的关键复杂性问题。此外,我们在潜在空间应用熵限制的VQ,自适应地确定 joint 速率扭曲优化的量化边界,从而提高了理论实验性能。与以前的NTC方法相比,NVTC表现出更好的速率扭曲性能,更快的解码速度,较小的模型大小。我们的代码可在this https URL上获取。
URL
https://arxiv.org/abs/2305.16025