Paper Reading AI Learner

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

2024-03-19 04:19:11
Jixiang Luo, Yan Wang, Hongwei Qin

Abstract

Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.

Abstract (translated)

学习到的图像压缩(LIC)在客观和主观指标方面取得了显著的进步。基于MSE的模型旨在提高客观指标,而基于生成模型的模型则试图利用生成模型的优势来改善客观指标。然而,它们在低比特率下都存在模糊或变形的问题,特别是在低于0.2bpp的比特率下。此外,对于视觉质量评估,面部和文本的变形是不可以接受的,问题在较小和文本上变得更加突出。为了解决这个问题,我们结合了基于MSE模型的优势和生成模型的优势,通过使用区域感兴趣(ROI)。我们提出了Hierarchical-ROI(H-ROI),将图像分割为多个前景区域和一个背景区域,以改善包含面部、文本和复杂纹理的区域的重建。此外,我们通过在通道维度非线性映射来实现自适应量化,以在保持视觉质量的同时约束比特率。充分的实验证明,我们的方法在低比特率下能够实现更好的视觉效果,例如,$0.7X$bits的HiFiC和$0.5X$bits的BPG。

URL

https://arxiv.org/abs/2403.13030

PDF

https://arxiv.org/pdf/2403.13030.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot