Paper Reading AI Learner

Channel-wise Feature Decorrelation for Enhanced Learned Image Compression

2024-03-16 14:30:25
Farhad Pakdaman, Moncef Gabbouj

Abstract

The emerging Learned Compression (LC) replaces the traditional codec modules with Deep Neural Networks (DNN), which are trained end-to-end for rate-distortion performance. This approach is considered as the future of image/video compression, and major efforts have been dedicated to improving its compression efficiency. However, most proposed works target compression efficiency by employing more complex DNNS, which contributes to higher computational complexity. Alternatively, this paper proposes to improve compression by fully exploiting the existing DNN capacity. To do so, the latent features are guided to learn a richer and more diverse set of features, which corresponds to better reconstruction. A channel-wise feature decorrelation loss is designed and is integrated into the LC optimization. Three strategies are proposed and evaluated, which optimize (1) the transformation network, (2) the context model, and (3) both networks. Experimental results on two established LC methods show that the proposed method improves the compression with a BD-Rate of up to 8.06%, with no added complexity. The proposed solution can be applied as a plug-and-play solution to optimize any similar LC method.

Abstract (translated)

学习压缩(LC)作为一种新兴的压缩技术,取代了传统的编码模块,使用了深度神经网络(DNN),这些网络是针对码率失真性能进行端到端训练的。这种方法被认为是图像/视频压缩的未来,并且为提高其压缩效率做出了主要努力。然而,大多数提出的作品通过采用更复杂的DNN来提高压缩效率,导致计算复杂度更高。相反,本文提出了一种通过充分利用现有DNN能力来提高压缩的方法。为此,将潜在特征指导学习更丰富和更多样化的特征,从而实现更好的重构。在LC优化中,设计了一个通道级特征相关损失,并将其集成进去。提出了三种策略并对其进行了评估,它们分别是优化(1)转换网络,(2)上下文模型,(3)两个网络。在两个已有的LC方法上进行实验,结果表明,与所提出的方法相比,压缩率提高了至少8.06%,而没有增加复杂性。所提出的解决方案可以作为一个可插拔的解决方案,用于优化任何类似的LC方法。

URL

https://arxiv.org/abs/2403.10936

PDF

https://arxiv.org/pdf/2403.10936.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot