Paper Reading AI Learner

Region-Adaptive Transform with Segmentation Prior for Image Compression

2024-03-01 16:03:37
Yuxi Liu, Wenhan Yang, Huihui Bai, Yunchao Wei, Yao Zhao


Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or self-attention-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The code will be released at this https URL.

Abstract (translated)

近年来,学习图像压缩(LIC)取得了显著进展。现有的 works 通常使用基于 CNN 的或自注意力机制的压缩方法。然而,还没有关于聚焦于特定区域的神经转换的研究。为了回应这个问题,我们引入了类无关的分割掩码(即没有类别标签的语义掩码)以提取区域适应的上下文信息。我们提出的模块,区域适应转换模块,在掩码的指导下对不同区域应用自适应卷积。此外,我们还引入了一个名为 Scale Affine Layer 的插件,以包含来自各个区域的丰富上下文。虽然之前有一些图像压缩努力使用了分割掩码作为附加的中间输入,但我们的方法与它们有显著区别。我们的优势在于,为了避免额外比特率开销,我们将这些掩码视为特权信息,在模型训练阶段可以访问,但在推理阶段不需要。据我们所知,我们是第一个将类无关掩码作为特权信息并实现像素级质量指标(如峰值信号噪声比,PSNR)优越性能的机构。实验结果表明,与之前表现良好的方法相比,我们的改进程度大约为 8.2% 比特率节省。代码将在此处发布:https:// 这个 URL。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot