Paper Reading AI Learner

HVQ-CGIC: Enabling Hyperprior Entropy Modeling for VQ-Based Controllable Generative Image Compression

2025-12-08 06:10:07
Niu Yi, Xu Tianyi, Ma Mingming, Wang Xinkun

Abstract

Generative learned image compression methods using Vector Quantization (VQ) have recently shown impressive potential in balancing distortion and perceptual quality. However, these methods typically estimate the entropy of VQ indices using a static, global probability distribution, which fails to adapt to the specific content of each image. This non-adaptive approach leads to untapped bitrate potential and challenges in achieving flexible rate control. To address this challenge, we introduce a Controllable Generative Image Compression framework based on a VQ Hyperprior, termed HVQ-CGIC. HVQ-CGIC rigorously derives the mathematical foundation for introducing a hyperprior to the VQ indices entropy model. Based on this foundation, through novel loss design, to our knowledge, this framework is the first to introduce RD balance and control into vector quantization-based Generative Image Compression. Cooperating with a lightweight hyper-prior estimation network, HVQ-CGIC achieves a significant advantage in rate-distortion (RD) performance compared to current state-of-the-art (SOTA) generative compression methods. On the Kodak dataset, we achieve the same LPIPS as Control-GIC, CDC and HiFiC with an average of 61.3% fewer bits. We posit that HVQ-CGIC has the potential to become a foundational component for VQGAN-based image compression, analogous to the integral role of the HyperPrior framework in neural image compression.

Abstract (translated)

基于向量量化(VQ)的生成式图像压缩方法最近展示了在平衡失真和感知质量方面的巨大潜力。然而,这些方法通常使用静态全局概率分布来估计VQ索引的熵,这无法适应每张图像的具体内容。这种非自适应的方法导致了比特率潜力未被充分利用,并且难以实现灵活的码率控制。为了应对这一挑战,我们引入了一种基于VQ超先验的可控生成式图像压缩框架(HVQ-CGIC)。该框架严格推导出在向量量化索引熵模型中引入超先验的数学基础。在此基础上,通过新颖的损失设计,据我们所知,这个框架首次实现了RD平衡和控制在基于向量量化的生成式图像压缩中的应用。配合轻量级的超先验估计网络,HVQ-CGIC在率失真性能上相比当前最先进的(SOTA)生成式压缩方法取得了显著优势。在Kodak数据集上,我们与Control-GIC、CDC和HiFiC达到了相同的LPIPS指标,但比特数平均减少了61.3%。我们认为HVQ-CGIC具有成为基于VQGAN的图像压缩的基础组件的巨大潜力,类似于超先验框架在神经网络图像压缩中的核心地位。

URL

https://arxiv.org/abs/2512.07192

PDF

https://arxiv.org/pdf/2512.07192.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot