Abstract
The images produced by diffusion models can attain excellent perceptual quality. However, it is challenging for diffusion models to guarantee distortion, hence the integration of diffusion models and image compression models still needs more comprehensive explorations. This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction, which achieves better perceptual quality while guaranteeing the distortion to an extent. We build a diffusion model and design a novel paradigm that combines the diffusion model and an end-to-end decoder, and the latter is responsible for transmitting the privileged information extracted at the encoder side. Specifically, we theoretically analyze the reconstruction process of the diffusion models at the encoder side with the original images being visible. Based on the analysis, we introduce an end-to-end convolutional decoder to provide a better approximation of the score function $\nabla_{\mathbf{x}_t}\log p(\mathbf{x}_t)$ at the encoder side and effectively transmit the combination. Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.
Abstract (translated)
扩散模型的输出图像可以达到出色的感知质量。然而,扩散模型很难保证失真,因此将扩散模型与图像压缩模型集成还需要更全面的探索。本文提出了一种基于扩散的图像压缩方法,采用有偏的端到端解码器模型作为校正,在保证失真程度的同时实现更好的感知质量。我们构建了一个扩散模型,并设计了一个新范式,将扩散模型和端到端解码器相结合,其中后者的任务是在编码器侧提取的有偏信息进行传输。具体来说,我们理论分析了扩散模型在原始图像可见的情况下进行编码的重建过程。根据分析,我们引入了一个端到端的卷积解码器,在编码器侧提供对得分函数 $\nabla_{\mathbf{x}_t}\log p(\mathbf{x}_t)$ 的更好近似,并有效传输组合。实验证明,与之前的所有感知压缩方法相比,我们的方法在失真和感知方面都具有优越性。
URL
https://arxiv.org/abs/2404.04916