Paper Reading AI Learner

Domain Adaptation for Learned Image Compression with Supervised Adapters

2024-04-24 01:50:36
Alberto Presta, Gabriele Spadaro, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto

Abstract

In Learned Image Compression (LIC), a model is trained at encoding and decoding images sampled from a source domain, often outperforming traditional codecs on natural images; yet its performance may be far from optimal on images sampled from different domains. In this work, we tackle the problem of adapting a pre-trained model to multiple target domains by plugging into the decoder an adapter module for each of them, including the source one. Each adapter improves the decoder performance on a specific domain, without the model forgetting about the images seen at training time. A gate network computes the weights to optimally blend the contributions from the adapters when the bitstream is decoded. We experimentally validate our method over two state-of-the-art pre-trained models, observing improved rate-distortion efficiency on the target domains without penalties on the source domain. Furthermore, the gate's ability to find similarities with the learned target domains enables better encoding efficiency also for images outside them.

Abstract (translated)

在学习图像压缩(LIC)中,模型在从源域中训练和编码和解码图像时进行训练,通常在自然图像上优于传统的编码器;然而,在从不同域中抽样时,其性能可能离最优水平还有很大差距。在本文中,我们通过在解码器中插入适配器模块来将预训练的模型适应多个目标域。每个适配器都在特定的域上提高了解码器的性能,而不会让模型忘记在训练时看到的图像。门网络计算在解码时最优地融合适配器的贡献。我们在两个最先进的预训练模型上进行实验验证,发现在没有对源域罚款的情况下,目标域的压缩率-失真效率得到了提高。此外,门的发现与学习到的目标域相似的能力还使得对于这些域之外的照片,编码器的效率也得到了提高。

URL

https://arxiv.org/abs/2404.15591

PDF

https://arxiv.org/pdf/2404.15591.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot