Paper Reading AI Learner

SigVIC: Spatial Importance Guided Variable-Rate Image Compression

2023-03-16 06:57:51
Jiaming Liang, Meiqin Liu, Chao Yao, Chunyu Lin, Yao Zhao

Abstract

Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variable-rate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning-based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility.

Abstract (translated)

可变速率机制已经改进了基于学习的图像压缩的灵活性和效率,该方法训练了多个模型以不同的速率失真权衡。最常见的可变速率方法之一是按通道或空间均匀尺度调整内部特征。然而,空间重要性的多样性对于图像压缩中的比特分配具有重要的启示作用。在本文中,我们介绍了一种基于空间重要性的可变速率图像压缩方法(SigVIC),其中空间限制单元(SGU)旨在自适应学习空间重要性掩码。然后,空间缩放网络(SSN)使用空间重要性掩码指导可变速率的特征缩放和比特分配。此外,为了提高解码图像的质量,我们选择K浅特征进行优化,通过浅特征融合模块(SFFM)进行解码特征的精化。实验结果表明,我们的方法比其他任何基于学习的方法(无论是可变速率还是非可变速率)以及传统编码器,具有存储节省和高灵活性。

URL

https://arxiv.org/abs/2303.09112

PDF

https://arxiv.org/pdf/2303.09112.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot