Paper Reading AI Learner

Generative Preprocessing for Image Compression with Pre-trained Diffusion Models

2025-12-17 10:22:11
Mengxi Guo, Shijie Zhao, Junlin Li, Li Zhang

Abstract

Preprocessing is a well-established technique for optimizing compression, yet existing methods are predominantly Rate-Distortion (R-D) optimized and constrained by pixel-level fidelity. This work pioneers a shift towards Rate-Perception (R-P) optimization by, for the first time, adapting a large-scale pre-trained diffusion model for compression preprocessing. We propose a two-stage framework: first, we distill the multi-step Stable Diffusion 2.1 into a compact, one-step image-to-image model using Consistent Score Identity Distillation (CiD). Second, we perform a parameter-efficient fine-tuning of the distilled model's attention modules, guided by a Rate-Perception loss and a differentiable codec surrogate. Our method seamlessly integrates with standard codecs without any modification and leverages the model's powerful generative priors to enhance texture and mitigate artifacts. Experiments show substantial R-P gains, achieving up to a 30.13% BD-rate reduction in DISTS on the Kodak dataset and delivering superior subjective visual quality.

Abstract (translated)

预处理是一种已确立的技术,用于优化压缩效果。然而,现有的方法大多以率失真(R-D)为目标,并受限于像素级保真度的要求。本文首次将大规模预训练的扩散模型应用于压缩预处理中,标志着向率感知(R-P)优化的重大转变。我们提出了一种两阶段框架:首先,通过一致分数身份蒸馏(CiD),我们将多步骤的Stable Diffusion 2.1转化为一个紧凑的一步式图像到图像模型。其次,我们在注意力模块上进行参数高效的微调,并利用率感知损失和可微分编解码器代理来指导这一过程。我们的方法可以无缝地与标准编解码器结合使用,无需任何修改,并且能够利用模型的强大生成先验知识来增强纹理并减轻伪影。实验结果表明,在Kodak数据集上,我们的方法在DISTS指标上的BD率(Bitrate-Distortion)减少了高达30.13%,并且提供了更好的主观视觉质量。

URL

https://arxiv.org/abs/2512.15270

PDF

https://arxiv.org/pdf/2512.15270.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot