Abstract
Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.
Abstract (translated)
学习到的图像压缩(LIC)在客观和主观指标方面取得了显著的进步。基于MSE的模型旨在提高客观指标,而基于生成模型的模型则试图利用生成模型的优势来改善客观指标。然而,它们在低比特率下都存在模糊或变形的问题,特别是在低于0.2bpp的比特率下。此外,对于视觉质量评估,面部和文本的变形是不可以接受的,问题在较小和文本上变得更加突出。为了解决这个问题,我们结合了基于MSE模型的优势和生成模型的优势,通过使用区域感兴趣(ROI)。我们提出了Hierarchical-ROI(H-ROI),将图像分割为多个前景区域和一个背景区域,以改善包含面部、文本和复杂纹理的区域的重建。此外,我们通过在通道维度非线性映射来实现自适应量化,以在保持视觉质量的同时约束比特率。充分的实验证明,我们的方法在低比特率下能够实现更好的视觉效果,例如,$0.7X$bits的HiFiC和$0.5X$bits的BPG。
URL
https://arxiv.org/abs/2403.13030