In this paper, we introduce a Variational Autoencoder (VAE) based training approach that can compress and decompress cancer pathology slides at a compression ratio of 1:512, which is better than the previously reported state of the art (SOTA) in the literature, while still maintaining accuracy in clinical validation tasks. The compression approach was tested on more common computer vision datasets such as CIFAR10, and we explore which image characteristics enable this compression ratio on cancer imaging data but not generic images. We generate and visualize embeddings from the compressed latent space and demonstrate how they are useful for clinical interpretation of data, and how in the future such latent embeddings can be used to accelerate search of clinical imaging data.
在本文中,我们介绍了一种基于变分自编码器(VAE)的训练方法,该方法能够在1:512的压缩比下对癌症病理学切片进行压缩和解压。这种方法比文献中先前报道的最高水平(SOTA)还要好,同时在临床验证任务中仍然保持了准确性。该压缩方法在更常见的计算机视觉数据集CIFAR10上进行测试,我们探索了哪些图像特征能够在癌症影像数据上实现这种压缩比,而不仅仅是一般图像。我们从压缩后的隐藏空间中生成并可视化嵌入,并演示了这些嵌入对于数据临床解释有何用途,以及未来如何使用这些隐藏的嵌入来加速临床影像数据的搜索。
https://arxiv.org/abs/2303.13332
The Internet has turned the entire world into a small village;this is because it has made it possible to share millions of images and videos. However, sending and receiving a huge amount of data is considered to be a main challenge. To address this issue, a new algorithm is required to reduce image bits and represent the data in a compressed form. Nevertheless, image compression is an important application for transferring large files and images. This requires appropriate and efficient transfers in this field to achieve the task and reach the best results. In this work, we propose a new algorithm based on discrete Hermite wavelets transformation (DHWT) that shows the efficiency and quality of the color images. By compressing the color image, this method analyzes it and divides it into approximate coefficients and detail coefficients after adding the wavelets into MATLAB. With Multi-Resolution Analyses (MRA), the appropriate filter is derived, and the mathematical aspects prove to be validated by testing a new filter and performing its operation. After the decomposition of the rows and upon the process of the reconstruction, taking the inverse of the filter and dealing with the columns of the matrix, the original matrix is improved by measuring the parameters of the image to achieve the best quality of the resulting image, such as the peak signal-to-noise ratio (PSNR), compression ratio (CR), bits per pixel (BPP), and mean square error (MSE).
互联网将整个世界变成了一个小村庄,这是因为它使分享数百万图像和视频变得容易。然而,发送和接收大量数据被认为是一个主要挑战。为了解决这一问题,需要一种新算法来减少图像比特数,并以压缩形式代表数据。然而,图像压缩是传输大型文件和图像的重要应用。这需要在这个领域的适当和高效的传输来实现任务并取得最佳结果。在本工作中,我们提出了基于离散哈特利小波变换的新算法,以展示彩色图像的效率和质量。通过压缩彩色图像,这种方法对其进行分析,并将其分为近似系数和细节系数,然后将小波添加到MATLAB中。通过多分辨率分析(MRA),得出适当的过滤器,并证明通过测试新的过滤器并进行其操作,数学方面得到了验证。在rows的分解和重建过程中,采取过滤器的逆,处理矩阵的 columns,原始矩阵通过测量图像参数来实现最佳图像质量,例如峰值信号-噪声比(PSNR)、压缩比(CR)、像素比特数(BPP)和均方误差(MSE)。
https://arxiv.org/abs/2303.13175
This work is unique in the use of discrete wavelets that were built from or derived from Chebyshev polynomials of the second and third kind, filter the Discrete Second Chebyshev Wavelets Transform (DSCWT), and derive two effective filters. The Filter Discrete Third Chebyshev Wavelets Transform (FDTCWT) is used in the process of analyzing color images and removing noise and impurities that accompany the image, as well as because of the large amount of data that makes up the image as it is taken. These data are massive, making it difficult to deal with each other during transmission. However to address this issue, the image compression technique is used, with the image not losing information due to the readings that were obtained, and the results were satisfactory. Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR), Bit Per Pixel (BPP), and Compression Ratio (CR) Coronavirus is the initial treatment, while the processing stage is done with network training for Convolutional Neural Networks (CNN) with Discrete Second Chebeshev Wavelets Convolutional Neural Network (DSCWCNN) and Discrete Third Chebeshev Wavelets Convolutional Neural Network (DTCWCNN) to create an efficient algorithm for face recognition, and the best results were achieved in accuracy and in the least amount of time. Two samples of color images that were made or implemented were used. The proposed theory was obtained with fast and good results; the results are evident shown in the tables below.
这项工作的独特之处在于使用了从或基于Chebyshev多项式第二类和第三类的离散小波,过滤Discrete Second Chebyshev Wavelets Transform(DSCWT),并推导出两个有效的过滤器。滤波Discrete Third Chebyshev Wavelets Transform(FDTCWT)用于分析彩色图像,并去除伴随图像的噪声和杂质,以及由于在拍摄时构成图像的大量数据。这些数据是巨大的,因此在传输期间很难相互处理。然而,为了解决这一问题,使用图像压缩技术,由于图像没有因获得的阅读而丢失信息,结果令人满意。Mean Square Error(MSE)、Peak Signal-to-Noise Ratio(PSNR)、每个像素的比特数(BPP)和压缩比(CR)是新冠病毒的初始治疗,而处理阶段使用网络训练Convolutional Neural Networks(CNN)与Discrete Second Chebyshev Wavelets Convolutional Neural Network(DSCWCNN)和Discrete Third Chebyshev Wavelets Convolutional Neural Network(DTCWCNN)创建高效的人脸识别算法,并且最好的结果是在精度和最少的时间内实现。使用了制作或实现的彩色图像的两个样本。提出的理论以快速和良好的结果得出,结果在以下表格中显而易见。
https://arxiv.org/abs/2303.13158
Microarray technology is a new and powerful tool for the concurrent monitoring of a large number of gene expressions. Each microarray experiment produces hundreds of images. Each digital image requires a large storage space. Hence, real-time processing of these images and transmission of them necessitates efficient and custom-made lossless compression schemes. In this paper, we offer a new architecture for the lossless compression of microarray images. In this architecture, we have used dedicated hardware for the separation of foreground pixels from background ones. By separating these pixels and using pipeline architecture, a higher lossless compression ratio has been achieved as compared to other existing methods.
微分阵列技术是一种新且有力的工具,用于同时监测大量基因表达。每个微分阵列实验产生数百个图像。每个数字图像都需要大量存储空间。因此,对这些图像的实时处理和传输需要高效且定制的无损失压缩方案。在本文中,我们提出了一种新的架构,用于无损失压缩微分阵列图像。在这个架构中,我们使用了专门设计的硬件来分离前景像素和背景像素。通过分离这些像素并使用管道架构,与其他现有方法相比,实现了更高的无损失压缩比。
https://arxiv.org/abs/2303.10489
With well-selected data, homogeneous diffusion inpainting can reconstruct images from sparse data with high quality. While 4K colour images of size 3840 x 2160 can already be inpainted in real time, optimising the known data for applications like image compression remains challenging: Widely used stochastic strategies can take days for a single 4K image. Recently, a first neural approach for this so-called mask optimisation problem offered high speed and good quality for small images. It trains a mask generation network with the help of a neural inpainting surrogate. However, these mask networks can only output masks for the resolution and mask density they were trained for. We solve these problems and enable mask optimisation for high-resolution images through a neuroexplicit coarse-to-fine strategy. Additionally, we improve the training and interpretability of mask networks by including a numerical inpainting solver directly into the network. This allows to generate masks for 4K images in around 0.6 seconds while exceeding the quality of stochastic methods on practically relevant densities. Compared to popular existing approaches, this is an acceleration of up to four orders of magnitude.
通过精心挑选数据,均匀扩散填充可以以高质量的方式从稀疏数据中重构图像。虽然4K彩色图像尺寸为3840 x 2160时可以实时填充,但对于像图像压缩这样的应用,优化已知数据仍然是一项挑战:广泛使用的随机策略可以对一个4K图像花费几天的时间。最近,一个名为mask优化问题的第一种神经网络方法为小型图像提供了高速度和高质量的解决方案。它使用神经网络填充代理来训练一个生成掩码的网络。但是,这些掩码网络只能为它们训练的分辨率和掩码密度输出掩码。我们解决这些问题并实现高分辨率图像的掩码优化,通过使用神经明确粗到精的策略。此外,我们改进了掩码网络的培训和解释性,通过将数值填充解决器直接添加到网络中。这使得生成4K图像的掩码可以在约0.6秒内完成,而在实际上相关的密度方面超越了随机方法的质量。与流行的现有方法相比,这是加速了四倍。
https://arxiv.org/abs/2303.10096
Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variable-rate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning-based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility.
可变速率机制已经改进了基于学习的图像压缩的灵活性和效率,该方法训练了多个模型以不同的速率失真权衡。最常见的可变速率方法之一是按通道或空间均匀尺度调整内部特征。然而,空间重要性的多样性对于图像压缩中的比特分配具有重要的启示作用。在本文中,我们介绍了一种基于空间重要性的可变速率图像压缩方法(SigVIC),其中空间限制单元(SGU)旨在自适应学习空间重要性掩码。然后,空间缩放网络(SSN)使用空间重要性掩码指导可变速率的特征缩放和比特分配。此外,为了提高解码图像的质量,我们选择K浅特征进行优化,通过浅特征融合模块(SFFM)进行解码特征的精化。实验结果表明,我们的方法比其他任何基于学习的方法(无论是可变速率还是非可变速率)以及传统编码器,具有存储节省和高灵活性。
https://arxiv.org/abs/2303.09112
Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly. Second, we develop the context-based distortion reduction module to refine partial latent tensors from the trit-planes and improve the reconstructed image quality. Third, we propose a retraining scheme for the decoder to attain better rate-distortion tradeoffs. Extensive experiments show that CTC outperforms the baseline trit-plane codec significantly in BD-rate on the Kodak lossless dataset, while increasing the time complexity only marginally. Our codes are available at this https URL.
Trit-plane编码可以实现深层的图像压缩,但它无法使用自回归上下文模型。在本文中,我们提出了基于上下文的Trit-plane编码算法(CTC)来实现更紧凑的渐进压缩。首先,我们开发了一个基于上下文的速率减少模块,准确地估计隐态元素的Trit概率,从而紧凑地编码Trit平面。其次,我们开发了一个基于上下文的失真减少模块,从Trit平面中优化部分隐态张量,并提高重建图像质量。第三,我们提出了解码器的重新训练计划,以获得更好的速率-失真权衡。广泛的实验结果表明,CTC在Kodak无损失数据集上的BD-速率上比基准Trit-plane编码器表现更好,但增加了时间复杂性 onlymarginally。我们的代码可在 this https URL 上获取。
https://arxiv.org/abs/2303.05715
Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to achieve wide-range variable rates within a single model. Specifically, QVRF defines a quantization regulator vector coupled with predefined Lagrange multipliers to control quantization error of all latent representation for discrete variable rates. Additionally, the reparameterization method makes QVRF compatible with a round quantizer. Exhaustive experiments demonstrate that existing fixed-rate VAE-based methods equipped with QVRF can achieve wide-range continuous variable rates within a single model without significant performance degradation. Furthermore, QVRF outperforms contemporary variable-rate methods in rate-distortion performance with minimal additional parameters.
学习的图像压缩表现出令人期望的压缩性能,但广泛的可变比特率仍然是一个挑战。最先进的可变速率方法牺牲了模型性能,并要求大量的额外参数。在本文中,我们提出了一个量化误差aware的可变速率框架(QVRF),它使用单个编码器来实现在一个模型中实现广泛的可变速率。具体来说,QVRF定义了一个量化控制器向量,与预先定义的拉普拉斯移量一起控制离散变量率的所有潜在表示的量化误差。此外,重参数化方法使QVRF与圆形量化器兼容。充分的实验结果表明,现有固定速率VAE-based方法配备了QVRF可以在一个模型中实现广泛的连续可变速率,而没有明显的性能下降。此外,QVRF在比率失真性能方面比 contemporary 可变速率方法出色,仅需要少量的额外参数。
https://arxiv.org/abs/2303.05744
Deep variational autoencoders for image and video compression have gained significant attraction in the recent years, due to their potential to offer competitive or better compression rates compared to the decades long traditional codecs such as AVC, HEVC or VVC. However, because of complexity and energy consumption, these approaches are still far away from practical usage in industry. More recently, implicit neural representation (INR) based codecs have emerged, and have lower complexity and energy usage to classical approaches at decoding. However, their performances are not in par at the moment with state-of-the-art methods. In this research, we first show that INR based image codec has a lower complexity than VAE based approaches, then we propose several improvements for INR-based image codec and outperformed baseline model by a large margin.
图像和视频压缩的深度学习变分自编码在过去几年中吸引了广泛关注,因为它们相较于数十年来传统的编解码器,可以提供更具竞争力或更好的压缩率。然而,由于复杂性和能源消耗,这些方法仍然远远未被 industry 实际应用。最近,基于隐含神经网络表示(INR)的编解码器出现了,其在解码时的复杂性和能源消耗比传统的方法低得多。然而,目前其性能尚未达到先进方法的水平。在这项研究中,我们首先表明,基于 INR 的图像编解码器的复杂性比 VAE based 的方法低得多,然后我们提出了多个改进,以优化基于 INR 的图像编解码器,并显著优于基准模型。
https://arxiv.org/abs/2303.03028
High-resolution (HR) images are usually downscaled to low-resolution (LR) ones for better display and afterward upscaled back to the original size to recover details. Recent work in image rescaling formulates downscaling and upscaling as a unified task and learns a bijective mapping between HR and LR via invertible networks. However, in real-world applications (e.g., social media), most images are compressed for transmission. Lossy compression will lead to irreversible information loss on LR images, hence damaging the inverse upscaling procedure and degrading the reconstruction accuracy. In this paper, we propose the Self-Asymmetric Invertible Network (SAIN) for compression-aware image rescaling. To tackle the distribution shift, we first develop an end-to-end asymmetric framework with two separate bijective mappings for high-quality and compressed LR images, respectively. Then, based on empirical analysis of this framework, we model the distribution of the lost information (including downscaling and compression) using isotropic Gaussian mixtures and propose the Enhanced Invertible Block to derive high-quality/compressed LR images in one forward pass. Besides, we design a set of losses to regularize the learned LR images and enhance the invertibility. Extensive experiments demonstrate the consistent improvements of SAIN across various image rescaling datasets in terms of both quantitative and qualitative evaluation under standard image compression formats (i.e., JPEG and WebP).
高分辨率(HR)图像通常被向下拉伸到低分辨率(LR)图像以更好地显示,然后再向上拉伸回原始大小以恢复细节。最近的图像重排工作将下拉和上拉作为一项统一任务,并使用可逆网络学习HR和LR之间的互逆映射。然而,在实际应用(如社交媒体)中,大多数图像都被压缩以传输。lossy压缩会导致LR图像的不可逆信息损失,因此破坏逆拉伸过程并降低重构精度。在本文中,我们提出了一种自对称逆转网络(SAIN),以压缩 aware 的图像重排。为了应对分布迁移,我们首先开发了一个端到端的自对称框架,分别用于高质量的和压缩的LR图像。然后,基于该框架的实证分析,我们使用 isotropic 均匀混合物建模丢失信息分布(包括下拉和压缩)并提出了增强逆转块,在一次forward过程中生成高质量的/压缩的LR图像。此外,我们设计了一套损失来 regularize 学习到的LR图像,并增强逆转性。广泛的实验结果表明,SAIN在各种图像重排数据集上在量化和定性评估的标准图像压缩格式(如JPEG和WebP)中表现出一致性的改进。
https://arxiv.org/abs/2303.02353
Typically, metadata of images are stored in a specific data segment of the image file. However, to securely detect changes, data can also be embedded within images. This follows the goal to invisibly and robustly embed as much information as possible to, ideally, even survive compression. This work searches for embedding principles which allow to distinguish between unintended changes by lossy image compression and malicious manipulation of the embedded message based on the change of its perceptual or robust hash. Different embedding and compression algorithms are compared. The study shows that embedding a message via integer wavelet transform and compression with Karhunen-Loeve-transform yields the best results. However, it was not possible to distinguish between manipulation and compression in all cases.
通常情况下,图像的元数据存储在图像文件中特定的数据段中。但是,为了确保安全地检测变化,数据也可以嵌入图像中。这符合的目标是尽可能隐蔽和稳健地嵌入尽可能多的信息和,最好地情况下 even survive 压缩。这项工作搜索了嵌入原则,以便能够区分通过损失图像压缩意外变化和恶意操纵嵌入消息并根据其感知或稳健哈希变化 malicious 操纵。不同嵌入和压缩算法被比较。研究结果表明,通过整数小波变换和卡尔曼-洛埃夫变换嵌入消息得到最佳结果。但是,在所有情况下都无法确定操作和压缩之间的差异。
https://arxiv.org/abs/2303.00092
Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image compression models. Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model that adds triggers in the DCT domain. In particular, we design several attack objectives for various attacking scenarios, including: 1) attacking compression quality in terms of bit-rate and reconstruction quality; 2) attacking task-driven measures, such as down-stream face recognition and semantic segmentation. Moreover, a novel simple dynamic loss is designed to balance the influence of different loss terms adaptively, which helps achieve more efficient training. Extensive experiments show that with our trained trigger injection models and simple modification of encoder parameters (of the compression model), the proposed attack can successfully inject several backdoors with corresponding triggers in a single image compression model.
最近,基于深度学习的压缩方法已经实现了比传统方法更好的性能。然而,深度学习模型已经证明是受后门攻击的弱点,其中某些特定的触发模式添加到输入可以导致模型的恶意行为。在本文中,我们提出了一种基于多个触发器的多个后门攻击,针对学习的图像压缩模型。鉴于现有的压缩系统和规范中广泛应用的离散余弦变换(DCT),我们提出了一种基于频率的触发注入模型,在DCT域中添加触发器。特别是,我们为各种攻击场景设计了几个攻击目标,包括:1)攻击比特率和重建质量的压缩质量;2)攻击任务驱动措施,如后续面部识别和语义分割。此外,我们设计了一种新的简单动态损失,旨在自适应地平衡不同损失 terms的影响,帮助实现更高效的训练。广泛的实验表明,结合我们训练的触发注入模型和简单的编码器参数修改(压缩模型),这种攻击可以在单个图像压缩模型中成功注入与相应的触发器的几个后门。
https://arxiv.org/abs/2302.14677
Visual Place Recognition (VPR) is a fundamental task that allows a robotic platform to successfully localise itself in the environment. For decentralised VPR applications where the visual data has to be transmitted between several agents, the communication channel may restrict the localisation process when limited bandwidth is available. JPEG is an image compression standard that can employ high compression ratios to facilitate lower data transmission for VPR applications. However, when applying high levels of JPEG compression, both the image clarity and size are drastically reduced. In this paper, we incorporate sequence-based filtering in a number of well-established, learnt and non-learnt VPR techniques to overcome the performance loss resulted from introducing high levels of JPEG compression. The sequence length that enables 100% place matching performance is reported and an analysis of the amount of data required for each VPR technique to perform the transfer on the entire spectrum of JPEG compression is provided. Moreover, the time required by each VPR technique to perform place matching is investigated, on both uniformly and non-uniformly JPEG compressed data. The results show that it is beneficial to use a highly compressed JPEG dataset with an increased sequence length, as similar levels of VPR performance are reported at a significantly reduced bandwidth. The results presented in this paper also emphasize that there is a trade-off between the amount of data transferred and the total time required to perform VPR. Our experiments also suggest that is often favourable to compress the query images to the same quality of the map, as more efficient place matching can be performed. The experiments are conducted on several VPR datasets, under mild to extreme JPEG compression.
视觉位置识别(VPR)是使机器人平台在环境中成功定位的基本任务。对于分散的VPR应用,视觉数据需要在多个代理之间传输,当有限的带宽可用时,通信通道可能会限制定位过程。JPEG是一种图像压缩标准,可以使用高压缩比来促进VPR应用较低的数据传输。然而,当应用采用高版本的JPEG压缩时,图像清晰度和大小都急剧下降。在本文中,我们将序列过滤纳入许多稳定且非学习的VPR技术中,以克服引入高版本的JPEG压缩所带来的性能损失。报道了能够实现100%位置匹配性能的序列长度,并对每个VPR技术在JPEG压缩光谱范围内的数据传输所需的数据量进行了分析。此外,我们还研究了每个VPR技术进行位置匹配所需的时间,在均匀和非均匀JPEG压缩的数据上进行了研究。结果表明,使用高压缩度的JPEG数据集和增加序列长度的选项 beneficial,因为相似的VPR性能在带宽显著减少的情况下出现。本文还强调,数据传输量与VPR总时间之间的权衡存在。我们的实验还表明,通常将查询图像压缩到地图质量相同的水平有利于进行更高效的位置匹配。在轻微的到极端的JPEG压缩条件下,对多个VPR数据集进行了实验。
https://arxiv.org/abs/2302.13314
Recently, learned image compression schemes have achieved remarkable improvements in image fidelity (e.g., PSNR and MS-SSIM) compared to conventional hybrid image coding ones due to their high-efficiency non-linear transform, end-to-end optimization frameworks, etc. However, few of them take the Just Noticeable Difference (JND) characteristic of the Human Visual System (HVS) into account and optimize learned image compression towards perceptual quality. To address this issue, a JND-based perceptual quality loss is proposed. Considering that the amounts of distortion in the compressed image at different training epochs under different Quantization Parameters (QPs) are different, we develop a distortion-aware adjustor. After combining them together, we can better assign the distortion in the compressed image with the guidance of JND to preserve the high perceptual quality. All these designs enable the proposed method to be flexibly applied to various learned image compression schemes with high scalability and plug-and-play advantages. Experimental results on the Kodak dataset demonstrate that the proposed method has led to better perceptual quality than the baseline model under the same bit rate.
最近,通过学习图像压缩方法,图像清晰度(例如,PSNR和MS-SSIM)相比传统的混合图像编码方法有了显著的提高,因为它们具有高效的非线性变换、端到端优化框架等特性。然而,只有少数方法考虑到了人类视觉系统的“注意到的差异”(JND)特性,并优化了学习的图像压缩,以追求感知质量。为了解决这一问题,提出了基于JND的感知质量损失。考虑到在不同训练 epoch 下压缩图像中失真的数量在不同量化参数(QP)下是不同的,我们开发了一个失真意识的调整器。将它们结合起来后,我们可以更好地分配压缩图像中的失真,根据JND的指导,以保留高感知质量。所有这些设计使得 proposed 方法可以灵活应用于各种具有高 scalability 和插拔优点的学习图像压缩方法。Kodak 数据集的实验结果显示, proposed 方法在相同的比特率下比基准模型带来了更好的感知质量。
https://arxiv.org/abs/2302.13092
While raw images exhibit advantages over sRGB images (e.g., linearity and fine-grained quantization level), they are not widely used by common users due to the large storage requirements. Very recent works propose to compress raw images by designing the sampling masks in the raw image pixel space, leading to suboptimal image representations and redundant metadata. In this paper, we propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner. Furthermore, we propose a novel sRGB-guided context model with improved entropy estimation strategies, which leads to better reconstruction quality, smaller size of metadata, and faster speed. We illustrate how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective. The experimental results show that the proposed method can achieve superior raw image reconstruction results using a smaller size of the metadata on both uncompressed sRGB images and JPEG images.
尽管原始图像表现出与sRGB图像相比的优势(例如,线性性和精细量化等级),但由于巨大的存储要求,它们并不被普遍使用。最近的研究提议通过在原始图像像素空间设计采样掩模来压缩原始图像,从而得到更好的图像表示和冗余 metadata。在本文中,我们提出了一种新框架,以学习在潜在空间中作为 metadata 的紧凑表示。此外,我们提出了一种改进的熵估计策略的新 sRGB 引导上下文模型,这导致更好的重建质量、 metadata 大小的减小和更快的速度。我们举例说明了如何提议的原始图像压缩方案可以自适应地分配更多比特到重要的图像区域。实验结果显示,该方法可以在不压缩sRGB图像和JPEG图像的情况下使用较小的 metadata 大小获得更好的原始图像重建结果。
https://arxiv.org/abs/2302.12995
Lossy face image compression can degrade the image quality and the utility for the purpose of face recognition. This work investigates the effect of lossy image compression on a state-of-the-art face recognition model, and on multiple face image quality assessment models. The analysis is conducted over a range of specific image target sizes. Four compression types are considered, namely JPEG, JPEG 2000, downscaled PNG, and notably the new JPEG XL format. Frontal color images from the ColorFERET database were used in a Region Of Interest (ROI) variant and a portrait variant. We primarily conclude that JPEG XL allows for superior mean and worst case face recognition performance especially at lower target sizes, below approximately 5kB for the ROI variant, while there appears to be no critical advantage among the compression types at higher target sizes. Quality assessments from modern models correlate well overall with the compression effect on face recognition performance.
有损人脸识别压缩会对图像质量和人脸识别功能产生不利影响。这项工作研究了有损图像压缩对最先进的人脸识别模型以及多个面部图像质量评估模型的影响。分析涵盖了具体的图像目标大小范围。考虑了四种压缩类型:JPEG、JPEG 2000、downscaled PNG以及新生成的JPEG XL格式。从ColorFERET数据库中获取的前方颜色图像被用于ROI变异体和肖像变异体。我们的主要结论是:JPEG XL可以在较低的目标大小下提供更好的平均和最坏人脸识别性能,特别是在ROI变异体目标大小低于约5kB的情况下,而其他压缩类型在更高的目标大小下似乎没有显著的竞争优势。现代模型的质量评估与压缩对人脸识别性能的影响Overall很好地相关。
https://arxiv.org/abs/2302.12593
With limited storage/bandwidth resources, input images to Computer Vision (CV) applications that use Deep Neural Networks (DNNs) are often encoded with JPEG that is tailored to Human Vision (HV). This paper presents Deep Selector-JPEG, an adaptive JPEG compression method that targets image classification while satisfying HV criteria. For each image, Deep Selector-JPEG selects adaptively a Quality Factor (QF) to compress the image so that a good trade-off between the Compression Ratio (CR) and DNN classifier Accuracy (Rate-Accuracy performance) can be achieved over a set of images for a variety of DNN classifiers while the MS-SSIM of such compressed image is greater than a threshold value predetermined by HV with a high probability. Deep Selector-JPEG is designed via light-weighted or heavy-weighted selector architectures. Experimental results show that in comparison with JPEG at the same CR, Deep Selector-JPEG achieves better Rate-Accuracy performance over the ImageNet validation set for all tested DNN classifiers with gains in classification accuracy between 0.2% and 1% at the same CRs while satisfying HV constraints. Deep Selector-JPEG can also roughly provide the original classification accuracy at higher CRs.
受限于存储/带宽资源,使用深度神经网络(DNN)的计算机视觉(CV)应用程序通常会使用针对人类视觉(HV)的JPEG编码。本文介绍了Deep Selector-JPEG,一种自适应JPEG压缩方法,旨在图像分类,同时满足HV标准。对于每个图像,Deep Selector-JPEG自适应地选择质量因子(QF)来压缩图像,以实现在多个DNN分类器上的图像分类多种模式,同时确保这些压缩图像的MS-SSIM大于由HV预设定的阈值值。Deep Selector-JPEG采用轻量级或重型选择架构设计。实验结果表明,与JPEG在同一压缩比下相比,Deep Selector-JPEG在ImageNet验证集上对所有测试的DNN分类器实现了更好的分类准确率,同时在不同CR值下提高了分类准确率,同时满足了HV限制。Deep Selector-JPEG还可以在更高的CR值下大致提供原始分类准确率。
https://arxiv.org/abs/2302.09560
Recent state-of-the-art Learned Image Compression methods feature spatial context models, achieving great rate-distortion improvements over hyperprior methods. However, the autoregressive context model requires serial decoding, limiting runtime performance. The Checkerboard context model allows parallel decoding at a cost of reduced RD performance. We present a series of multistage spatial context models allowing both fast decoding and better RD performance. We split the latent space into square patches and decode serially within each patch while different patches are decoded in parallel. The proposed method features a comparable decoding speed to Checkerboard while reaching the RD performance of Autoregressive and even also outperforming Autoregressive. Inside each patch, the decoding order must be carefully decided as a bad order negatively impacts performance; therefore, we also propose a decoding order optimization algorithm.
最近最先进的学习图像压缩方法采用了空间上下文模型,取得了比超先验方法的巨大 Rate-distortion 改善。然而,自回归上下文模型需要序列解码,限制了运行时的性能。棋盘上下文模型允许并行解码,以降低 RD 性能的代价。我们提出了一系列多级空间上下文模型,允许快速解码和提高 RD 性能。我们将隐空间分为平方 patch 并在每个 patch 内序列解码,同时不同 patch 并行解码。 proposed 方法具有与棋盘解码速度相当的解码速度,同时达到自回归上下文模型的 RD 性能,甚至超越了自回归。在每个 patch 内部,解码顺序必须 carefully 决定,因为糟糕的解码顺序会负面影响性能。因此,我们还提出了解码顺序优化算法。
https://arxiv.org/abs/2302.09263
Ultra high resolution (UHR) images are almost always downsampled to fit small displays of mobile end devices and upsampled to its original resolution when exhibited on very high-resolution displays. This observation motivates us on jointly optimizing operation pairs of downsampling and upsampling that are spatially adaptive to image contents for maximal rate-distortion performance. In this paper, we propose an adaptive downsampled dual-layer (ADDL) image compression system. In the ADDL compression system, an image is reduced in resolution by learned content-adaptive downsampling kernels and compressed to form a coded base layer. For decompression the base layer is decoded and upconverted to the original resolution using a deep upsampling neural network, aided by the prior knowledge of the learned adaptive downsampling kernels. We restrict the downsampling kernels to the form of Gabor filters in order to reduce the complexity of filter optimization and also reduce the amount of side information needed by the decoder for adaptive upsampling. Extensive experiments demonstrate that the proposed ADDL compression approach of jointly optimized, spatially adaptive downsampling and upconversion outperforms the state of the art image compression methods.
超高清(UHR)图像几乎总是被裁剪以适应移动设备终端设备的小型显示器,并在展示在极高的分辨率显示器上时将其恢复至其原始分辨率。这一观察激励我们共同优化空间自适应裁剪和拉伸操作对,以最大化速率扭曲性能。在本文中,我们提出了一种自适应裁剪两 layer (ADDL)图像压缩系统。在ADDL压缩系统中,图像通过学习内容自适应裁剪kernels进行裁剪,并压缩成编码基层。为了解压,基层通过深度拉伸神经网络解码和恢复至原始分辨率,借助学习自适应裁剪kernels的先前知识。我们限制裁剪kernels以 Gabor 滤波器的形式,以减少滤波优化的复杂性,并减少解码器自适应拉伸所需的 Side Information 数量。广泛的实验表明,提出的ADDL压缩方法,空间自适应裁剪和解码的协同优化,比当前的图像压缩方法更有效。
https://arxiv.org/abs/2302.06096
Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at this https URL.
神经网络压缩已经超越了传统的编码器(H.266/VVC)在帧率失真(RD)性能方面的表现,但面临着复杂的高帧率失真权衡和独立的模型问题。在本文中,我们提出了一种高效的单模型可变比特率编码器(EVC),能够在768x512输入图像下以30帧率运行,并且仍然能够在RD性能方面比VVC表现更好。通过进一步减少编码器和解码器的复杂性,我们的小型模型甚至能够在1920x1080输入图像下以30帧率运行。为了填补我们不同能力模型之间的性能差距,我们仔细设计了掩码衰减,该衰减将大型模型参数自动转换为小型模型。我们还提出了一种新的稀疏Regularization Loss,以减轻L_pRegularization的缺陷。我们的算法在中型和小型模型中分别显著减少了50%和30%的性能差距。最后,我们支持 scalable encoder for neural image compression。编码器的复杂性是动态的,以满足不同延迟要求。我们提议多次衰减大型编码器以逐步减少残留的表示。掩码衰减和残留的表示学习极大地提高了我们的 scalable encoder的RD性能。我们的代码在这httpsURL上。
https://arxiv.org/abs/2302.05071