Abstract
Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or self-attention-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The code will be released at this https URL.
Abstract (translated)
近年来,学习图像压缩(LIC)取得了显著进展。现有的 works 通常使用基于 CNN 的或自注意力机制的压缩方法。然而,还没有关于聚焦于特定区域的神经转换的研究。为了回应这个问题,我们引入了类无关的分割掩码(即没有类别标签的语义掩码)以提取区域适应的上下文信息。我们提出的模块,区域适应转换模块,在掩码的指导下对不同区域应用自适应卷积。此外,我们还引入了一个名为 Scale Affine Layer 的插件,以包含来自各个区域的丰富上下文。虽然之前有一些图像压缩努力使用了分割掩码作为附加的中间输入,但我们的方法与它们有显著区别。我们的优势在于,为了避免额外比特率开销,我们将这些掩码视为特权信息,在模型训练阶段可以访问,但在推理阶段不需要。据我们所知,我们是第一个将类无关掩码作为特权信息并实现像素级质量指标(如峰值信号噪声比,PSNR)优越性能的机构。实验结果表明,与之前表现良好的方法相比,我们的改进程度大约为 8.2% 比特率节省。代码将在此处发布:https:// 这个 URL。
URL
https://arxiv.org/abs/2403.00628