Abstract
In recent years, the demand of image compression models for machine vision has increased dramatically. However, the training frameworks of image compression still focus on the vision of human, maintaining the excessive perceptual details, thus have limitations in optimally reducing the bits per pixel in the case of performing machine vision tasks. In this paper, we propose Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion, termed SLIM. This is a new effective training framework of image compression for machine vision, using a pretrained latent diffusion this http URL compressor model of our method focuses only on the Region-of-Interest (RoI) areas for machine vision in the image latent, to compress it compactly. Then the pretrained Unet model enhances the decompressed latent, utilizing a RoI-focused text caption which containing semantic information of the image. Therefore, SLIM is able to focus on RoI areas of the image without any guide mask at the inference stage, achieving low bitrate when compressing. And SLIM is also able to enhance a decompressed latent by denoising steps, so the final reconstructed image from the enhanced latent can be optimized for the machine vision task while still containing perceptual details for human vision. Experimental results show that SLIM achieves a higher classification accuracy in the same bits per pixel condition, compared to conventional image compression models for machines.
Abstract (translated)
近年来,机器视觉领域的图像压缩模型需求急剧增加。然而,现有的图像压缩训练框架仍然侧重于人类的视觉感知,保持过多的感知细节,在执行机器视觉任务时难以有效减少每像素比特数(bits per pixel)。本文提出了一种基于语义的低比特率机器图像压缩方法SLIM(Semantic-based Low-bitrate Image compression for Machines),该方法利用扩散技术。这是为机器视觉设计的一种新的有效的图像压缩训练框架,它使用预训练的潜在扩散模型作为编码器,并且我们的压缩模型仅关注图像潜在空间中的感兴趣区域(Region-of-Interest, RoI)以实现紧凑压缩。 然后,预训练的U-Net模型通过对包含图像语义信息的RoI聚焦文本描述来增强解压缩后的潜在特征。因此,SLIM能够在不使用引导掩模的情况下专注于图像的RoI区域进行低比特率压缩,并且它还可以通过去噪步骤优化解压后的潜在特征,使得最终从改进后潜在特征重构出的图像既能为机器视觉任务提供优化效果,又保留了人类感知所需的细节。实验结果表明,在相同每像素比特数条件下,SLIM相比于传统面向机器的图像压缩模型具有更高的分类准确率。
URL
https://arxiv.org/abs/2512.18200