Paper Reading AI Learner

Progressive Learned Image Compression for Machine Perception

2025-12-23 05:45:38
Jungwoo Kim, Jun-Hyuk Kim, Jong-Seok Lee

Abstract

Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image compression with fine granular scalability (FGS)-which enables decoding a single bitstream at multiple quality levels-remains unexplored for machine-oriented codecs. In this work, we propose a novel progressive learned image compression codec for machine perception, PICM-Net, based on trit-plane coding. By analyzing the difference between human- and machine-oriented rate-distortion priorities, we systematically examine the latent prioritization strategies in terms of machine-oriented codecs. To further enhance real-world adaptability, we design an adaptive decoding controller, which dynamically determines the necessary decoding level during inference time to maintain the desired confidence of downstream machine prediction. Extensive experiments demonstrate that our approach enables efficient and adaptive progressive transmission while maintaining high performance in the downstream classification task, establishing a new paradigm for machine-aware progressive image compression.

Abstract (translated)

最近,针对机器感知的图像编码技术在学习型图像编解码领域取得了进展。然而,具有细粒度可伸缩性(FGS)的渐进式图像压缩——允许从单一比特流中以多个质量级别进行解码——对于面向机器的技术而言仍是一个未被探索的研究方向。本文提出了一种基于三值平面编码的新颖渐进式学习型图像压缩编解码器PICM-Net,专门用于机器感知。通过分析人类和机器导向的速率失真优先级之间的差异,我们系统地研究了面向机器的编码器中的潜在优先策略。为了进一步增强现实世界的适应性,我们设计了一个自适应解码控制器,在推理过程中动态确定所需的解码级别,以维持下游机器预测所需的信心水平。 广泛的实验表明,我们的方法能够在保持下游分类任务高性能的同时实现高效且适应性强的渐进式传输,从而为面向机器的渐进式图像压缩建立了一种新的范例。

URL

https://arxiv.org/abs/2512.20070

PDF

https://arxiv.org/pdf/2512.20070.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot