Image Compression for Machines (ICM) aims to compress images for machine vision tasks rather than human viewing. Current works predominantly concentrate on high-level tasks like object detection and semantic segmentation. However, the quality of original images is usually not guaranteed in the real world, leading to even worse perceptual quality or downstream task performance after compression. Low-level (LL) machine vision models, like image restoration models, can help improve such quality, and thereby their compression requirements should also be considered. In this paper, we propose a pioneered ICM framework for LL machine vision tasks, namely LL-ICM. By jointly optimizing compression and LL tasks, the proposed LL-ICM not only enriches its encoding ability in generalizing to versatile LL tasks but also optimizes the processing ability of down-stream LL task models, achieving mutual adaptation for image codecs and LL task models. Furthermore, we integrate large-scale vision-language models into the LL-ICM framework to generate more universal and distortion-robust feature embeddings for LL vision tasks. Therefore, one LL-ICM codec can generalize to multiple tasks. We establish a solid benchmark to evaluate LL-ICM, which includes extensive objective experiments by using both full and no-reference image quality assessments. Experimental results show that LL-ICM can achieve 22.65% BD-rate reductions over the state-of-the-art methods.
https://arxiv.org/abs/2412.03841
In this paper, we investigate the counter-forensic effects of the forthcoming JPEG AI standard based on neural image compression, focusing on two critical areas: deepfake image detection and image splicing localization. Neural image compression leverages advanced neural network algorithms to achieve higher compression rates while maintaining image quality. However, it introduces artifacts that closely resemble those generated by image synthesis techniques and image splicing pipelines, complicating the work of researchers when discriminating pristine from manipulated content. We comprehensively analyze JPEG AI's counter-forensic effects through extensive experiments on several state-of-the-art detectors and datasets. Our results demonstrate that an increase in false alarms impairs the performance of leading forensic detectors when analyzing genuine content processed through JPEG AI. By exposing the vulnerabilities of the available forensic tools we aim to raise the urgent need for multimedia forensics researchers to include JPEG AI images in their experimental setups and develop robust forensic techniques to distinguish between neural compression artifacts and actual manipulations.
https://arxiv.org/abs/2412.03261
Recent advancements in deep learning-based compression techniques have surpassed traditional methods. However, deep neural networks remain vulnerable to backdoor attacks, where pre-defined triggers induce malicious behaviors. This paper introduces a novel frequency-based trigger injection model for launching backdoor attacks with multiple triggers on learned image compression models. Inspired by the widely used DCT in compression codecs, triggers are embedded in the DCT domain. We design attack objectives tailored to diverse scenarios, including: 1) degrading compression quality in terms of bit-rate and reconstruction accuracy; 2) targeting task-driven measures like face recognition and semantic segmentation. To improve training efficiency, we propose a dynamic loss function that balances loss terms with fewer hyper-parameters, optimizing attack objectives effectively. For advanced scenarios, we evaluate the attack's resistance to defensive preprocessing and propose a two-stage training schedule with robust frequency selection to enhance resilience. To improve cross-model and cross-domain transferability for downstream tasks, we adjust the classification boundary in the attack loss during training. Experiments show that our trigger injection models, combined with minor modifications to encoder parameters, successfully inject multiple backdoors and their triggers into a single compression model, demonstrating strong performance and versatility. (*Due to the notification of arXiv "The Abstract field cannot be longer than 1,920 characters", the appeared Abstract is shortened. For the full Abstract, please download the Article.)
https://arxiv.org/abs/2412.01646
Inspired by the success of generative image models, recent work on learned image compression increasingly focuses on better probabilistic models of the natural image distribution, leading to excellent image quality. This, however, comes at the expense of a computational complexity that is several orders of magnitude higher than today's commercial codecs, and thus prohibitive for most practical applications. With this paper, we demonstrate that by focusing on modeling visual perception rather than the data distribution, we can achieve a very good trade-off between visual quality and bit rate similar to "generative" compression models such as HiFiC, while requiring less than 1% of the multiply-accumulate operations (MACs) for decompression. We do this by optimizing C3, an overfitted image codec, for Wasserstein Distortion (WD), and evaluating the image reconstructions with a human rater study. The study also reveals that WD outperforms other perceptual quality metrics such as LPIPS, DISTS, and MS-SSIM, both as an optimization objective and as a predictor of human ratings, achieving over 94% Pearson correlation with Elo scores.
https://arxiv.org/abs/2412.00505
Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, most existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. To overcome the above problems, this paper proposes a learned fine-grained scalable image compression framework, namely DeepFGS. Specifically, we introduce a feature separation backbone to divide the image information into basic and scalable features, then redistribute the features channel by channel through an information rearrangement strategy. In this way, we can generate a continuously scalable bitstream via one-pass encoding. For entropy coding, we design a mutual entropy model to fully explore the correlation between the basic and scalable features. In addition, we reuse the decoder to reduce the parameters and computational complexity. Experiments demonstrate that our proposed DeepFGS outperforms previous learning-based scalable image compression models and traditional scalable image codecs in both PSNR and MS-SSIM metrics.
https://arxiv.org/abs/2412.00437
In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity will also be higher. To balance between compression performance and complexity, we extend the Gaussian model to the generalized Gaussian model for more flexible latent distribution modeling, introducing only one additional shape parameter, beta, than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including beta-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression methods.
https://arxiv.org/abs/2411.19320
It is customary to deploy uniform scalar quantization in the end-to-end optimized Neural image compression methods, instead of more powerful vector quantization, due to the high complexity of the latter. Lattice vector quantization (LVQ), on the other hand, presents a compelling alternative, which can exploit inter-feature dependencies more effectively while keeping computational efficiency almost the same as scalar quantization. However, traditional LVQ structures are designed/optimized for uniform source distributions, hence nonadaptive and suboptimal for real source distributions of latent code space for Neural image compression tasks. In this paper, we propose a novel learning method to overcome this weakness by designing the rate-distortion optimal lattice vector quantization (OLVQ) codebooks with respect to the sample statistics of the latent features to be compressed. By being able to better fit the LVQ structures to any given latent sample distribution, the proposed OLVQ method improves the rate-distortion performances of the existing quantization schemes in neural image compression significantly, while retaining the amenability of uniform scalar quantization.
在端到端优化的神经图像压缩方法中,通常会部署统一标量量化而非更强大的向量量化,因为后者复杂度较高。另一方面,晶格矢量量化(LVQ)提供了一个有吸引力的替代方案,它能够更有效地利用特征间的依赖关系,同时保持几乎与标量量化相同的计算效率。然而,传统的LVQ结构是为均匀源分布设计/优化的,因此对于神经图像压缩任务中潜在代码空间的实际源分布而言是非自适应且次优的。在本文中,我们提出了一种新的学习方法,通过根据要压缩的潜在特征样本统计信息来设计速率失真最优的晶格矢量量化(OLVQ)码本,以克服这一弱点。通过能够更好地将LVQ结构与任何给定的潜在样本分布相匹配,所提出的OLVQ方法显著提升了现有量化方案在神经图像压缩中的速率失真性能,同时保留了均匀标量量化的优势。
https://arxiv.org/abs/2411.16119
Modern compression systems use linear transformations in their encoding and decoding processes, with transforms providing compact signal representations. While multiple data-dependent transforms for image/video coding can adapt to diverse statistical characteristics, assembling large datasets to learn each transform is challenging. Also, the resulting transforms typically lack fast implementation, leading to significant computational costs. Thus, despite many papers proposing new transform families, the most recent compression standards predominantly use traditional separable sinusoidal transforms. This paper proposes integrating a new family of Symmetry-based Graph Fourier Transforms (SBGFTs) of variable sizes into a coding framework, focusing on the extension from our previously introduced 8x8 SBGFTs to the general case of NxN grids. SBGFTs are non-separable transforms that achieve sparse signal representation while maintaining low computational complexity thanks to their symmetry properties. Their design is based on our proposed algorithm, which generates symmetric graphs on the grid by adding specific symmetrical connections between nodes and does not require any data-dependent adaptation. Furthermore, for video intra-frame coding, we exploit the correlations between optimal graphs and prediction modes to reduce the cardinality of the transform sets, thus proposing a low-complexity framework. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection (MTS) used in the latest VVC intra-coding, providing a bit rate saving percentage of 6.23%, with only a marginal increase in average complexity. A MATLAB implementation of the proposed algorithm is available online at [1].
现代压缩系统在其编码和解码过程中使用线性变换,这些变换提供了紧凑的信号表示。虽然针对图像/视频编码的多个数据相关变换能够适应各种统计特性,但收集大型数据集来学习每个变换具有挑战性。此外,所得出的变换通常缺乏快速实现,导致显著的计算成本。因此,尽管有许多论文提出新的变换家族,最近的压缩标准主要使用传统的可分离正弦变换。 本文提议将一种新型基于对称性的图傅里叶变换(Symmetry-based Graph Fourier Transforms, SBGFTs)家族整合进编码框架中,该家族大小可变,并重点关注从我们之前引入的8x8 SBGFTs扩展到一般情况下的NxN网格。SBGFTs是非分离变换,能够实现稀疏信号表示同时保持较低计算复杂度,这归功于其对称性属性。它们的设计基于我们的算法,该算法通过在网格节点间添加特定的对称连接来生成具有对称性的图,并且不需要任何数据相关的适应。 此外,在视频帧内编码中,我们利用最优图形与预测模式之间的相关性减少变换集的数量,从而提出一种低复杂度框架。实验表明,SBGFTs的表现优于最新VVC帧内编码使用的显式多变换选择(MTS)中的主要变换,提供了6.23%的比特率节省,并且仅带来平均复杂度的小幅增加。我们提出的算法的MATLAB实现可以在[1]在线获取。
https://arxiv.org/abs/2411.15824
Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.
有损图像压缩网络旨在最小化图像的潜在熵,同时遵循特定的失真约束。然而,由于神经网络学习量化潜在表示的本质,优化它可能具有挑战性。本文的关键发现是,在某种程度上,最小化潜在熵等同于最大化条件源熵,这一见解深深植根于信息论中的等式。基于此洞察,我们提出了一种新颖的结构正则化方法用于神经图像压缩任务,通过将负条件源熵纳入训练目标中,从而促进优化效果和模型的泛化能力。所提出的基于信息论的正则器具有可解释性、即插即用的特点,并且不增加推理开销。大量的实验表明,它在调节模型和进一步从潜在表示中压缩比特方面表现出色,适用于各种压缩结构和未见过的数据域。
https://arxiv.org/abs/2411.16727
Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression technique that leverages prior mission information. Our approach employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images. We evaluate the proposed compression technique using a dataset from an artificial ocean basin, demonstrating superior compression ratios and image quality over existing techniques. Moreover, our method exhibits robustness to introduction of new objects within the scene, highlighting its potential for advancing tetherless remotely operated vehicle operations.
实时的视觉反馈对于远程操作车辆的无缆控制至关重要,特别是在检查和操纵任务中。尽管声学通信是水下中程通信的首选方式,但其有限的带宽使得实时传输图像或视频变得不切实际。为了解决这个问题,我们提出了一种基于模型的图像压缩技术,该技术利用了先前的任务信息。我们的方法采用了训练好的机器学习为基础的新视角合成模型,并使用梯度下降优化来精炼潜在表示,以帮助生成相机图像与渲染图像之间的可压缩差异。我们通过一个人工海洋盆地的数据集对提出的压缩技术进行了评估,结果显示其具有比现有技术更优的压缩比率和图像质量。此外,我们的方法对于场景中新物体的引入表现出鲁棒性,这凸显了它在推进无缆远程操作车辆操作方面的潜力。
https://arxiv.org/abs/2411.13862
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58\% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at this https URL.
得益于强大的生成模型,利用感知度量的低比特率学习图像压缩(LIC)模型已经成为可能。一些最先进的模型通过使用图像标题作为子信息实现了高压缩率和优越的感知质量。本文展示了利用大型多模态模型(LMM),可以在单个模型中生成并压缩这些标题。我们还提出了一种面向语义-感知的新微调方法,适用于任何LIC网络,并与现有方法相比,在LPIPS BD-rate上实现了41.58%的改进。我们的实现和预训练权重可在以下链接获取:[https URL]。
https://arxiv.org/abs/2411.13033
We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs.
我们最近见证了“智能”和“压缩”是同一枚硬币的两面,其中具有前所未有的智能的语言大模型(LLM)能够作为各种数据模式的通用无损压缩器。这一特性特别吸引了无损图像压缩社区的兴趣,因为当前在流媒体时代对压缩高分辨率图像的需求不断增加。因此,自然而然地产生了这样的设想:LLM的压缩性能能否将无损图像压缩提升到新的高度?然而,我们的研究发现,基于LLM的无损图像压缩器的直接应用与现有最先进的(SOTA)编解码器在常见基准数据集上的表现存在显著差距。鉴于此,我们致力于充分发挥LLM在无损图像压缩任务中的前所未有的智能(压缩)能力,从而弥合理论和实际压缩性能之间的差距。具体来说,我们提出了P$^{2}$-LLM,这是一种基于下一像素预测的LLM,它结合了各种精妙的见解和方法,例如:像素级先验、LLM的上下文能力以及一种保持像素级语义策略,以增强对像素序列的理解能力,从而实现更好的下一像素预测。在基准数据集上的广泛实验表明,P$^{2}$-LLM能够击败现有的经典和学习型编解码器。
https://arxiv.org/abs/2411.12448
Adversarial robustness of neural networks is an increasingly important area of research, combining studies on computer vision models, large language models (LLMs), and others. With the release of JPEG AI - the first standard for end-to-end neural image compression (NIC) methods - the question of its robustness has become critically significant. JPEG AI is among the first international, real-world applications of neural-network-based models to be embedded in consumer devices. However, research on NIC robustness has been limited to open-source codecs and a narrow range of attacks. This paper proposes a new methodology for measuring NIC robustness to adversarial attacks. We present the first large-scale evaluation of JPEG AI's robustness, comparing it with other NIC models. Our evaluation results and code are publicly available online (link is hidden for a blind review).
神经网络的对抗鲁棒性是一个日益重要的研究领域,结合了对计算机视觉模型、大型语言模型(LLMs)及其他领域的研究。随着JPEG AI——首个端到端神经图像压缩(NIC)方法的标准发布——其鲁棒性的问题变得至关重要。JPEG AI 是首批将基于神经网络的模型嵌入消费者设备中的国际实际应用之一。然而,关于NIC鲁棒性的研究仅限于开源编解码器和有限范围的攻击。本文提出了一种新的测量NIC对抗攻击鲁棒性的方法论。我们进行了首个大规模评估JPEG AI 鲁棒性的研究,并将其与其它NIC模型进行比较。我们的评估结果及代码已公开在线提供(盲审期间链接被隐藏)。
https://arxiv.org/abs/2411.11795
Recent advances in neural camera imaging pipelines have demonstrated notable progress. Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint optimization in system components, computational redundancies, and optical distortions such as lens this http URL light of this, we propose an end-to-end camera imaging pipeline (RealCamNet) to enhance real-world camera imaging performance. Our methodology diverges from conventional, fragmented multi-stage image signal processing towards end-to-end architecture. This architecture facilitates joint optimization across the full pipeline and the restoration of coordinate-biased distortions. RealCamNet is designed for high-quality conversion from RAW to RGB and compact image compression. Specifically, we deeply analyze coordinate-dependent optical distortions, e.g., vignetting and dark shading, and design a novel Coordinate-Aware Distortion Restoration (CADR) module to restore coordinate-biased distortions. Furthermore, we propose a Coordinate-Independent Mapping Compression (CIMC) module to implement tone mapping and redundant information compression. Existing datasets suffer from misalignment and overly idealized conditions, making them inadequate for training real-world imaging pipelines. Therefore, we collected a real-world imaging dataset. Experiment results show that RealCamNet achieves the best rate-distortion performance with lower inference latency.
最近在神经相机成像管道方面取得了显著进展。然而,现实世界的成像管道仍面临挑战,包括系统组件缺乏联合优化、计算冗余以及光学畸变等问题,如镜头畸变等。鉴于此,我们提出了一种端到端的相机成像管道(RealCamNet),以提升真实世界中的相机成像性能。我们的方法与传统的分段多阶段图像信号处理方法不同,采用了端到端架构。这种架构有利于整个管道的联合优化以及坐标偏畸变的恢复。RealCamNet旨在实现从RAW格式到RGB格式的高质量转换及紧凑型图像压缩。具体来说,我们深入分析了依赖坐标的光学畸变(例如眩光和暗角)并设计了一个新的坐标感知畸变恢复(CADR)模块以恢复坐标偏差引起的畸变。此外,我们提出了一种坐标无关映射压缩(CIMC)模块来实现色调映射及冗余信息的压缩。现有的数据集存在对齐问题且过于理想化,无法满足训练真实世界成像管道的需求。因此,我们收集了一个现实世界的成像数据集。实验结果表明RealCamNet在保持较低推理延迟的同时实现了最佳的率失真性能。
https://arxiv.org/abs/2411.10773
Learned progressive image compression is gaining momentum as it allows improved image reconstruction as more bits are decoded at the receiver. We propose a progressive image compression method in which an image is first represented as a pair of base-quality and top-quality latent representations. Next, a residual latent representation is encoded as the element-wise difference between the top and base representations. Our scheme enables progressive image compression with element-wise granularity by introducing a masking system that ranks each element of the residual latent representation from most to least important, dividing it into complementary components, which can be transmitted separately to the decoder in order to obtain different reconstruction quality. The masking system does not add further parameters nor complexity. At the receiver, any elements of the top latent representation excluded from the transmitted components can be independently replaced with the mean predicted by the hyperprior architecture, ensuring reliable reconstructions at any intermediate quality level. We also introduced Rate Enhancement Modules (REMs), which refine the estimation of entropy parameters using already decoded components. We obtain results competitive with state-of-the-art competitors, while significantly reducing computational complexity, decoding time, and number of parameters.
学习的渐进图像压缩正逐渐获得关注,因为它允许接收端随着解码更多比特而改进图像重建。我们提出了一种渐进图像压缩方法,在这种方法中,首先将图像表示为基础质量和顶级质量的潜在表示对。接着,通过顶部和底部表示之间的逐元素差值来编码残差潜在表示。我们的方案通过引入一个排序系统实现了具有逐元素细粒度的渐进图像压缩,该系统按重要性从高到低排列残差潜在表示中的每个元素,并将其划分为互补组件,这些组件可以分别传输给解码器以获得不同重建质量。此屏蔽系统不会增加额外参数或复杂性。在接收端,任何未包含在已传输组件中的顶级潜在表示元素都可以独立地用超先验架构预测的平均值替换,确保在任意中间质量水平上进行可靠的重建。我们还引入了速率增强模块(REMs),这些模块利用已经解码的组件来细化熵参数的估计。我们的方法获得了与最新竞争对手相竞争的结果,同时显著降低了计算复杂性、解码时间和参数数量。
https://arxiv.org/abs/2411.10185
Learning-based image compression methods have improved in recent years and started to outperform traditional codecs. However, neural-network approaches can unexpectedly introduce visual artifacts in some images. We therefore propose methods to separately detect three types of artifacts (texture and boundary degradation, color change, and text corruption), to localize the affected regions, and to quantify the artifact strength. We consider only those regions that exhibit distortion due solely to the neural compression but that a traditional codec recovers successfully at a comparable bitrate. We employed our methods to collect artifacts for the JPEG AI verification model with respect to HM-18.0, the H.265 reference software. We processed about 350,000 unique images from the Open Images dataset using different compression-quality parameters; the result is a dataset of 46,440 artifacts validated through crowd-sourced subjective assessment. Our proposed dataset and methods are valuable for testing neural-network-based image codecs, identifying bugs in these codecs, and enhancing their performance. We make source code of the methods and the dataset publicly available.
基于学习的图像压缩方法近年来已经取得了进步,并开始超越传统的编解码器。然而,神经网络的方法可能会在某些图像中意外地引入视觉伪影。因此,我们提出了分别检测三种类型伪影(纹理和边界退化、颜色变化以及文本损坏)的方法,以定位受影响区域并量化伪影强度。我们仅考虑那些由于神经压缩而出现失真但传统编解码器可以在相似比特率下成功恢复的区域。我们将这些方法应用于JPEG AI验证模型相对于H.265参考软件HM-18.0来收集伪影。使用不同的压缩质量参数,处理了来自Open Images数据集的大约350,000张独特图像;结果是一个通过众包主观评估验证的46,440个伪影的数据集。我们提出的数据集和方法对于测试基于神经网络的图像编解码器、识别这些编解码器中的错误以及提高其性能非常有价值。我们将方法的源代码及数据集公开提供。
https://arxiv.org/abs/2411.06810
Medical image compression is a widely studied field of data processing due to its prevalence in modern digital databases. This domain requires a high color depth of 12 bits per pixel component for accurate analysis by physicians, primarily in the DICOM format. Standard raster-based compression of images via filtering is well-known; however, it remains suboptimal in the medical domain due to non-specialized implementations. This study proposes a lossless medical image compression algorithm, CompaCT, that aims to target spatial features and patterns of pixel concentration for dynamically enhanced data processing. The algorithm employs fractal pixel traversal coupled with a novel approach of segmentation and meshing between pixel blocks for preprocessing. Furthermore, delta and entropy coding are applied to this concept for a complete compression pipeline. The proposal demonstrates that the data compression achieved via fractal segmentation preprocessing yields enhanced image compression results while remaining lossless in its reconstruction accuracy. CompaCT is evaluated in its compression ratios on 3954 high-color CT scans against the efficiency of industry-standard compression techniques (i.e., JPEG2000, RLE, ZIP, PNG). Its reconstruction performance is assessed with error metrics to verify lossless image recovery after decompression. The results demonstrate that CompaCT can compress and losslessly reconstruct medical images, being 37% more space-efficient than industry-standard compression systems.
医学图像压缩是数据处理中一个广泛研究的领域,由于其在现代数字数据库中的普遍应用。该领域的图像需要每像素组件12位的颜色深度以实现医生准确分析,主要采用DICOM格式。通过滤波器进行标准光栅图像压缩是一个众所周知的方法;然而,由于非专业实施,在医学领域仍然次优。本研究提出了一种无损的医疗图像压缩算法CompaCT,旨在针对空间特征和像素浓度模式进行动态增强的数据处理。该算法采用了分形像素遍历,并结合了像素块之间的分割与网格化的新方法作为预处理步骤。此外,还应用了差值编码和熵编码的概念以完成完整的压缩流程。提案表明通过分形单元预处理实现的数据压缩可改善图像压缩结果的同时保持重建的无损准确性。CompaCT在3954个高色深CT扫描上的压缩比进行了评估,并与行业标准压缩技术(如JPEG2000、RLE、ZIP和PNG)的效率进行了比较。其重建性能通过误差度量来验证解压后图像是否为无损恢复。结果显示,CompaCT能够对医疗图像进行压缩并实现无损重建,比行业标准压缩系统节省37%的空间。
https://arxiv.org/abs/2308.13097
In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.
近年来,学习型图像压缩方法在率失真性能上已经表现出优于传统图像压缩方法的能力。最近的方法利用了卷积神经网络(CNN)、变分自编码器(VAE)、可逆神经网络(INN)和变压器模型。尽管这些模型做出了显著贡献,但它们的主要缺点是捕捉局部冗余性方面的表现不佳。因此,为了结合全局特征与局部冗余性的优势,我们提出了一种基于CNN的解决方案,并集成了一个特征编码模块。该特征编码模块在将重要特征输入到CNN之前对其进行编码,并进一步利用跨尺度窗口注意力机制来捕获更多的局部冗余信息。跨尺度窗口注意力机制受到了变压器中注意力机制的启发,能够有效地扩展感受野。我们架构中的特征编码模块和跨尺度窗口注意力模块均具有灵活性,可以被集成到任何其他网络结构中。我们在Kodak和CLIC数据集上评估了我们的方法,并证明了该方法的有效性以及与当前最先进方法相当的表现水平。
https://arxiv.org/abs/2410.21144
In lossy image compression, models face the challenge of either hallucinating details or generating out-of-distribution samples due to the information bottleneck. This implies that at times, introducing hallucinations is necessary to generate in-distribution samples. The optimal level of hallucination varies depending on image content, as humans are sensitive to small changes that alter the semantic meaning. We propose a novel compression method that dynamically balances the degree of hallucination based on content. We collect data and train a model to predict user preferences on hallucinations. By using this prediction to adjust the perceptual weight in the reconstruction loss, we develop a Conditionally Hallucinating compression model (ConHa) that outperforms state-of-the-art image compression methods. Code and images are available at this https URL.
在有损图像压缩中,模型面临着因信息瓶颈而要么产生幻觉细节,要么生成离群样本的挑战。这表明有时为了生成分布内的样本,引入一些幻觉是必要的。幻觉的最佳程度取决于图像内容,因为人类对那些改变语义意义的小变化非常敏感。我们提出了一种新颖的压缩方法,该方法根据内容动态平衡幻觉的程度。我们收集了数据并训练了一个模型来预测用户对幻觉的偏好。通过利用这一预测调整重构损失中的感知权重,我们开发了一种条件性幻觉压缩模型(ConHa),其性能超过了最先进的图像压缩方法。代码和图像可以在以下链接中找到:[此 https URL]。
https://arxiv.org/abs/2410.19493
In this work, we propose a unified representation for Super-Resolution (SR) and Image Compression, termed **Factorized Fields**, motivated by the shared principles between these two tasks. Both SISR and Image Compression require recovering and preserving fine image details--whether by enhancing resolution or reconstructing compressed data. Unlike previous methods that mainly focus on network architecture, our proposed approach utilizes a basis-coefficient decomposition to explicitly capture multi-scale visual features and structural components in images, addressing the core challenges of both tasks. We first derive our SR model, which includes a Coefficient Backbone and Basis Swin Transformer for generalizable Factorized Fields. Then, to further unify these two tasks, we leverage the strong information-recovery capabilities of the trained SR modules as priors in the compression pipeline, improving both compression efficiency and detail reconstruction. Additionally, we introduce a merged-basis compression branch that consolidates shared structures, further optimizing the compression process. Extensive experiments show that our unified representation delivers state-of-the-art performance, achieving an average relative improvement of 204.4% in PSNR over the baseline in Super-Resolution (SR) and 9.35% BD-rate reduction in Image Compression compared to the previous SOTA.
在这项工作中,我们提出了一个用于超分辨率(SR)和图像压缩的统一表示方法,称为**因子化场**,其灵感来自于这两个任务之间的共同原则。无论是通过增强分辨率还是重建压缩数据,单图像超分辨率(SISR)和图像压缩都需要恢复和保持图像的细节。与之前的专注于网络架构的方法不同,我们提出的方法利用了基-系数分解来明确捕捉图像中的多尺度视觉特征和结构组件,从而解决这两个任务的核心挑战。首先,我们推导出了我们的SR模型,该模型包括一个系数主干和基础Swin变换器,以实现通用的因子化场。然后,为了进一步统一这两个任务,我们将训练好的SR模块的强大信息恢复能力作为压缩流水线中的先验,提升了压缩效率和细节重建效果。此外,我们引入了一个合并基压缩分支,整合了共享结构,进一步优化了压缩过程。广泛的实验表明,我们的统一表示方法达到了最先进的性能,在超分辨率(SR)中相对于基线的PSNR平均相对提高了204.4%,而在图像压缩方面与之前的最先进方法相比BD率降低了9.35%。
https://arxiv.org/abs/2410.18083