Paper Reading AI Learner

Compressing image encoders via latent distillation

2026-01-09 08:50:38
Caroline Mazini Rodrigues (IRISA, CNRS), Nicolas Keriven (CNRS, IRISA, COMPACT), Thomas Maugey (COMPACT)

Abstract

Deep learning models for image compression often face practical limitations in hardware-constrained applications. Although these models achieve high-quality reconstructions, they are typically complex, heavyweight, and require substantial training data and computational resources. We propose a methodology to partially compress these networks by reducing the size of their encoders. Our approach uses a simplified knowledge distillation strategy to approximate the latent space of the original models with less data and shorter training, yielding lightweight encoders from heavyweight ones. We evaluate the resulting lightweight encoders across two different architectures on the image compression task. Experiments show that our method preserves reconstruction quality and statistical fidelity better than training lightweight encoders with the original loss, making it practical for resource-limited environments.

Abstract (translated)

深度学习模型在图像压缩方面的应用经常面临硬件受限条件下的实际限制。尽管这些模型能够实现高质量的重建,但它们通常复杂且占用大量资源,需要大量的训练数据和计算资源。我们提出了一种方法,通过减小这些网络编码器的大小来对其进行部分压缩。我们的方法使用简化的知识蒸馏策略,在较少的数据和较短的训练时间下近似原始模型的潜在空间,从而从重型编码器生成轻量级编码器。 我们在两种不同架构上对产生的轻量级编码器在图像压缩任务中的效果进行了评估。实验表明,与使用原始损失训练轻量级编码器相比,我们的方法更好地保持了重建质量和统计保真度,使其适用于资源受限的环境。

URL

https://arxiv.org/abs/2601.05639

PDF

https://arxiv.org/pdf/2601.05639.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot