Supervised deep learning techniques can be used to generate synthetic 7T MRIs from 3T MRI inputs. This image enhancement process leverages the advantages of ultra-high-field MRI to improve the signal-to-noise and contrast-to-noise ratios of 3T acquisitions. In this paper, we introduce multiple novel 7T synthesization algorithms based on custom-designed variants of the V-Net convolutional neural network. We demonstrate that the V-Net based model has superior performance in enhancing both single-site and multi-site MRI datasets compared to the existing benchmark model. When trained on 3T-7T MRI pairs from 8 subjects with mild Traumatic Brain Injury (TBI), our model achieves state-of-the-art 7T synthesization performance. Compared to previous works, synthetic 7T images generated from our pipeline also display superior enhancement of pathological tissue. Additionally, we implement and test a data augmentation scheme for training models that are robust to variations in the input distribution. This allows synthetic 7T models to accommodate intra-scanner and inter-scanner variability in multisite datasets. On a harmonized dataset consisting of 18 3T-7T MRI pairs from two institutions, including both healthy subjects and those with mild TBI, our model maintains its performance and can generalize to 3T MRI inputs with lower resolution. Our findings demonstrate the promise of V-Net based models for MRI enhancement and offer a preliminary probe into improving the generalizability of synthetic 7T models with data augmentation.
监督深度学习技术可用于从3T MRI输入生成合成7T MRI。这种图像增强过程利用了超高清MRI的优势来提高3T扫描的信号与噪声比和对比与噪声比。在本文中,我们基于自定义设计的V-Net卷积神经网络引入了多个新的7T合成算法。我们证明了基于V-Net的模型在增强单站点和多站点MRI数据集方面比现有基准模型具有卓越性能。当用8个受轻度创伤性脑损伤(TBI)的患者进行训练时,我们的模型在增强7T合成性能方面达到了最先进的水平。与之前的工作相比,我们通过我们的管道生成的合成7T图像还突出了病理组织增强。此外,我们还实现并测试了一个数据增强方案,用于训练对输入分布的变化具有鲁棒性的模型。这使得合成7T模型能够适应多站点数据集中的内部和间歇性变化。在一个由两个机构共18个3T-7T MRI对组成的和谐数据集中,我们的模型保持其性能,并可以降低分辨率后的3T MRI输入。我们的研究结果证明了V-Net基于模型的MRI增强前景,并为提高合成7T模型的泛化性提供了初步试探。
https://arxiv.org/abs/2403.08979
In this paper, we present a novel fog-aware object detection network called FogGuard, designed to address the challenges posed by foggy weather conditions. Autonomous driving systems heavily rely on accurate object detection algorithms, but adverse weather conditions can significantly impact the reliability of deep neural networks (DNNs). Existing approaches fall into two main categories, 1) image enhancement such as IA-YOLO 2) domain adaptation based approaches. Image enhancement based techniques attempt to generate fog-free image. However, retrieving a fogless image from a foggy image is a much harder problem than detecting objects in a foggy image. Domain-adaptation based approaches, on the other hand, do not make use of labelled datasets in the target domain. Both categories of approaches are attempting to solve a harder version of the problem. Our approach builds over fine-tuning on the Our framework is specifically designed to compensate for foggy conditions present in the scene, ensuring robust performance even. We adopt YOLOv3 as the baseline object detection algorithm and introduce a novel Teacher-Student Perceptual loss, to high accuracy object detection in foggy images. Through extensive evaluations on common datasets such as PASCAL VOC and RTTS, we demonstrate the improvement in performance achieved by our network. We demonstrate that FogGuard achieves 69.43\% mAP, as compared to 57.78\% for YOLOv3 on the RTTS dataset. Furthermore, we show that while our training method increases time complexity, it does not introduce any additional overhead during inference compared to the regular YOLO network.
在本文中,我们提出了一个名为FogGuard的新 fog-aware 物体检测网络,旨在解决雾天气条件下的挑战。自动驾驶系统高度依赖准确的物体检测算法,但恶劣天气条件会显著影响深度神经网络(DNNs)的可靠性。现有的方法可以分为两个主要的类别,1)图像增强,如IA-YOLO 2)基于域的适应方法。基于图像增强的技术试图生成雾中的无雾图像。然而,从雾中图像中检索无雾图像是一个比在雾中检测物体更困难的问题。基于域的适应方法,另一方面,没有使用目标领域内的标记数据。两类方法都在尝试解决一个更难的问题。我们的方法在FogGuard框架上进行了微调,专门针对场景中存在的雾状天气条件,确保 even的性能。我们采用 YOLOv3 作为基线物体检测算法,并引入了一种新的教师-学生感知损失,以实现对雾中图像的高精度物体检测。通过对PASCAL VOC和RTTS等常见数据集的广泛评估,我们证明了我们的网络在性能上的改善。我们证明了FogGuard实现了69.43\%的mAP,而YOLOv3在RTTS数据集上的值为57.78\%。此外,我们还证明了,尽管我们的训练方法增加了运行时复杂性,但在推理过程中并没有引入任何额外的开销,与常规的YOLO网络相比。
https://arxiv.org/abs/2403.08939
Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark regions are amplified by current methods, which results in uncomfortable visual effects for observers. Based on this observation, this study aims at enhancing compressed dark images while avoiding compression artifacts amplification. Since texture details intertwine with compression artifacts in compressed dark images, detail enhancement and blocking artifacts suppression contradict each other in image space. Therefore, we handle the task in latent space. To this end, we propose a novel latent mapping network based on variational auto-encoder (VAE). Firstly, different from previous VAE-based methods with single-resolution features only, we exploit multiple latent spaces with multi-resolution features, to reduce the detail blur and improve image fidelity. Specifically, we train two multi-level VAEs to project compressed dark images and normal-light images into their latent spaces respectively. Secondly, we leverage a latent mapping network to transform features from compressed dark space to normal-light space. Specifically, since the degradation models of darkness and compression are different from each other, the latent mapping process is divided mapping into enlightening branch and deblocking branch. Comprehensive experiments demonstrate that the proposed method achieves state-of-the-art performance in compressed dark image enhancement.
暗图像增强的目的是将暗图像转换为正常光线图像。现有的暗图像增强方法以未压缩的暗图像作为输入,取得了很大的性能。然而,在实践中,在存储或通过互联网传输之前,暗图像通常会被压缩。当前的暗图像处理方法在处理压缩暗图像时表现不佳。被压缩的暗图像中的隐藏 artifacts 被增强,导致观察者产生不舒适的视觉效果。根据这个观察结果,本研究旨在在避免压缩 artifacts 放大的前提下增强压缩暗图像。 由于在压缩暗图像中,纹理细节与压缩 artifacts 在图像空间中交织在一起,因此细节增强和屏蔽 artifacts 抑制在图像空间中是相互矛盾的。因此,我们在潜在空间中处理这个问题。为此,我们提出了一个基于变分自编码器(VAE)的新型潜在映射网络。首先,与仅具有单分辨率特征的之前 VAE 方法不同,我们利用多个潜在空间具有多分辨率特征,以减少细节模糊并提高图像保真度。具体来说,我们训练两个多级 VAE 将压缩暗图像和正常光线图像投影到其潜在空间中。其次,我们利用潜在映射网络将压缩暗空间中的特征转换到正常光线空间。具体来说,由于黑暗和压缩的衰减模型不同,我们将映射过程分为增强明暗分支和抑制阻塞分支。全面的实验证明,与最先进的基于压缩暗图像增强的方法相比,本研究的方法在压缩暗图像增强领域取得了最先进的性能。
https://arxiv.org/abs/2403.07622
Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with self-prior learning. Our main novelty lies in the design of severe augmentation, which allows our model to learn robust priors. Unlike MAE that uses masking, we leverage two key challenging factors of nighttime images as augmentation: light effects and noise. During training, we intentionally degrade clear images by blending them with light effects as well as by adding noise, and subsequently restore the clear images. This enables our model to learn clear background priors. By increasing the noise values to approach as high as the pixel intensity values of the glow and light effect blended images, our augmentation becomes severe, resulting in stronger priors. While our self-prior learning is considerably effective in suppressing glow and revealing details of background scenes, in some cases, there are still some undesired artifacts that remain, particularly in the forms of over-suppression. To address these artifacts, we propose a self-refinement module based on the semi-supervised teacher-student framework. Our NightHaze, especially our MAE-like self-prior learning, shows that models trained with severe augmentation effectively improve the visibility of input haze images, approaching the clarity of clear nighttime images. Extensive experiments demonstrate that our NightHaze achieves state-of-the-art performance, outperforming existing nighttime image dehazing methods by a substantial margin of 15.5% for MUSIQ and 23.5% for ClipIQA.
遮蔽自动编码器(MAE)证明了在训练过程中进行严重 augment 会生成对高级任务具有稳健表示的 robust 表示。本文将 MAE 类似框架应用于夜间图像增强,证明了在训练过程中进行严重 augment 会生成强大的网络 prior,对真实世界夜间雾霾 degradation具有韧性。我们提出了一个基于自监督的夜间图像去雾方法,通过自监督学习。我们其主要创新在于严重 augment 的设计,允许我们的模型学习稳健的 prior。与 MAE 使用遮蔽不同,我们利用夜间图像的两个关键挑战因素——光效果和噪声作为 augment。在训练过程中,我们故意通过混合光效果和添加噪声来降低清晰度,然后 restore the clear images,这使得我们的模型能够学习到清晰背景 prior。通过增加噪声值以接近混浊图像的像素强度值,我们的 augment 变得严重,导致更强的 prior。尽管我们的自监督学习在抑制辉光和揭示背景场景的细节方面非常有效,但在某些情况下,仍然存在一些不需要的伪影,尤其是在模型的过度抑制 forms 上。为了解决这些伪影,我们提出了一个基于半监督教师-学生框架的自修复模块。我们的 NightHaze,尤其是我们的 MAE 类似自监督学习,证明了训练过程中严重 augment 的模型能够有效改善输入辉光图像的可见性,接近清晰夜间图像的清晰度。大量的实验证明,我们的 NightHaze 达到了最先进的水平,在 MUSIQ 和 ClipIQA 上实现了比现有夜间图像去雾方法 15.5% 的优异性能。
https://arxiv.org/abs/2403.07408
Contemporary no-reference image quality assessment (NR-IQA) models can effectively quantify the perceived image quality, with high correlations between model predictions and human perceptual scores on fixed test sets. However, little progress has been made in comparing NR-IQA models from a perceptual optimization perspective. Here, for the first time, we demonstrate that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement. This is achieved by taking the gradients in differentiable and bijective diffusion latents rather than in the raw pixel domain. Different NR-IQA models are likely to induce different enhanced images, which are ultimately subject to psychophysical testing. This leads to a new computational method for comparing NR-IQA models within the analysis-by-synthesis framework. Compared to conventional correlation-based metrics, our method provides complementary insights into the relative strengths and weaknesses of the competing NR-IQA models in the context of perceptual optimization.
当代无参考图像质量评估(NR-IQA)模型可以有效地量化感知图像质量,在固定测试集中的模型预测与人类感知分数之间具有高相关性。然而,在从感知优化角度比较NR-IQA模型的进展还很小。在这里,我们第一次证明了NR-IQA模型可以插入到图像增强的最大后验(MAP)估计框架中。这是通过在可微和可逆扩散潜在层中取梯度实现,而不是在原始像素域中取梯度。不同的NR-IQA模型可能会产生不同的增强图像,这些最终会受到心理物理学测试。这导致了一种在分析-by-合成框架内比较NR-IQA模型的新的计算方法。与基于相关性的指标相比,我们的方法为感知优化框架内竞争NR-IQA模型的相对优劣提供了互补性的见解。
https://arxiv.org/abs/2403.06406
Computed Tomography (CT) is a widely used medical imaging modality, and as it is based on ionizing radiation, it is desirable to minimize the radiation dose. However, a reduced radiation dose comes with reduced image quality, and reconstruction from low-dose CT (LDCT) data is still a challenging task which is subject to research. According to the LoDoPaB-CT benchmark, a benchmark for LDCT reconstruction, many state-of-the-art methods use pipelines involving UNet-type architectures. Specifically the top ranking method, ItNet, employs a three-stage process involving filtered backprojection (FBP), a UNet trained on CT data, and an iterative refinement step. In this paper, we propose a less complex two-stage method. The first stage also employs FBP, while the novelty lies in the training strategy for the second stage, characterized as the CT image enhancement stage. The crucial point of our approach is that the neural network is pretrained on a distinctly different pretraining task with non-CT data, namely Gaussian noise removal on a variety of natural grayscale images (photographs). We then fine-tune this network for the downstream task of CT image enhancement using pairs of LDCT images and corresponding normal-dose CT images (NDCT). Despite being notably simpler than the state-of-the-art, as the pretraining did not depend on domain-specific CT data and no further iterative refinement step was necessary, the proposed two-stage method achieves competitive results. The proposed method achieves a shared top ranking in the LoDoPaB-CT challenge and a first position with respect to the SSIM metric.
Computed Tomography(CT)是一种广泛使用的医学成像方式,由于其基于辐射,因此希望最小化辐射剂量。然而,较低的辐射剂量会降低图像质量,从低辐射CT(LDCT)数据进行重建仍然具有挑战性,并且受研究。根据LoDoPaB-CT基准,LDCT重建的基准,许多最先进的方法使用涉及UNet类型架构的流程。具体而言,排名第一的方法ItNet采用三个阶段的过程,包括滤波反投影(FBP),在CT数据上训练的UNet以及迭代优化步骤。在本文中,我们提出了一个更简单的方法。第一阶段同样采用FBP,而第二阶段的创新点在于其特征在于对CT图像增强阶段的训练策略,该策略表现为对多种自然灰度图像(照片)进行高斯噪声消除。接着,我们用LDCT图像和相应正常剂量CT图像(NDCT)对下游任务进行微调。尽管在状态最先进的方法中,该方法明显更简单,但由于预训练不依赖于领域特定的CT数据,并且没有进一步迭代优化步骤,因此所提出的两个阶段方法仍然具有竞争性。该方法在LoDoPaB-CT挑战中获得了共享的Top 1排名,并且在SSIM指标上获得了第一名的成绩。
https://arxiv.org/abs/2403.03551
Diffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains and successfully alleviates the dependence on pairwise training data via zero-reference learning. Specifically, we first design the initial optimization network to preprocess the input image and implement bidirectional constraints between the diffusion model and the initial optimization network through multiple objective functions. Subsequently, the degradation factors of the real-world scene are optimized iteratively to achieve effective light enhancement. In addition, we explore a frequency-domain based and semantically guided appearance reconstruction module that encourages feature alignment of the recovered image at a fine-grained level and satisfies subjective expectations. Finally, extensive experiments demonstrate the superiority of our approach to other state-of-the-art methods and more significant generalization capabilities. We will open the source code upon acceptance of the paper.
基于扩散模型的低光图像增强方法严重依赖成对训练数据,导致应用范围有限。同时,现有的无监督方法缺乏有效的桥接能力来解决未知退化问题。为了克服这些限制,我们提出了一个名为零LED的全新的基于扩散模型的低光图像增强方法。它利用扩散模型的稳定性来弥合低光域和真实正常光域之间的差距,并通过零参考学习成功减轻了依赖成对训练数据的问题。具体来说,我们首先设计了一个初始优化网络来处理输入图像,并通过多个目标函数在扩散模型和初始优化网络之间实现双向约束。然后,对现实场景的退化因子进行迭代优化,以实现有效的光线增强。此外,我们还探索了一个基于频域和语义引导的图像复原模块,鼓励在细粒度水平上恢复图像的特征对齐,并满足主观期望。最后,通过大量的实验证明,我们的方法在其他最先进的 methods中的优越性以及更强的泛化能力。在我们论文被接受后,我们将开源代码。
https://arxiv.org/abs/2403.02879
Ultrasound is a widely used medical tool for non-invasive diagnosis, but its images often contain speckle noise which can lower their resolution and contrast-to-noise ratio. This can make it more difficult to extract, recognize, and analyze features in the images, as well as impair the accuracy of computer-assisted diagnostic techniques and the ability of doctors to interpret the images. Reducing speckle noise, therefore, is a crucial step in the preprocessing of ultrasound images. Researchers have proposed several speckle reduction methods, but no single method takes all relevant factors into account. In this paper, we compare seven such methods: Median, Gaussian, Bilateral, Average, Weiner, Anisotropic and Denoising auto-encoder without and with skip connections in terms of their ability to preserve features and edges while effectively reducing noise. In an experimental study, a convolutional noise-removing auto-encoder with skip connection, a deep learning method, was used to improve ultrasound images of breast cancer. This method involved adding speckle noise at various levels. The results of the deep learning method were compared to those of traditional image enhancement methods, and it was found that the proposed method was more effective. To assess the performance of these algorithms, we use three established evaluation metrics and present both filtered images and statistical data.
超声波是一种广泛用于非侵入性诊断的医疗工具,但它的图像通常包含噪点,这可能会降低其分辨率和高频比。这使得从图像中提取、识别和分析特征变得更加困难,还会影响计算机辅助诊断技术以及医生解释图像的准确性。因此,减少噪点在超声图像预处理过程中是一个关键步骤。研究者们提出了几种谱减方法,但没有一个方法能同时考虑所有相关因素。在本文中,我们比较了七种这样的方法:中值、高斯、双边、平均、Weiner、各向同性(Isotropic)和去噪自动编码器(Denoising Auto-Encoder)无跳连接和有跳连接,在保留图像特征和边缘的同时有效降低噪声。在乳腺癌超声图像的实验研究中,使用卷积噪声去除自动编码器(Convolutional Noise-Removing Auto-Encoder with Skip Connection)和深度学习方法来改善超声图像。该方法在各个级别添加噪点。将深度学习方法的結果与传统图像增强方法的結果進行比較,發現提出的 method 更有效。为了评估这些算法的性能,我们使用了三个 established 的评估指标,并提供了滤波后的图像和统计数据。
https://arxiv.org/abs/2403.02750
Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhancement (UIE) tasks, and gained SOTA performance. However, these methods fail to consider the physical properties and underwater imaging mechanisms in the diffusion process, limiting information completion capacity of diffusion models. In this paper, we introduce a novel UIE framework, named PA-Diff, designed to exploiting the knowledge of physics to guide the diffusion process. PA-Diff consists of Physics Prior Generation (PPG) Branch and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch is a plug-and-play network to produce the physics prior, which can be integrated into any deep framework. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. Extensive experiments prove that our method achieves best performance on UIE tasks.
水下视觉经历各种复杂降解,不可避免地影响了水下视觉任务的效率。最近,扩散模型被用于水下图像增强(UIE)任务,并获得了最先进的性能。然而,这些方法没有考虑扩散过程中物理学性质和水下成像机制,限制了扩散模型的信息完整性。在本文中,我们引入了一个名为PA-Diff的新UIE框架,旨在利用物理知识引导扩散过程。PA-Diff包括物理先验生成(PPG)分支和物理感知扩散Transformer(PDT)分支。我们设计的PPG分支是一个可插拔的网络,可以生成物理学先验,可以集成到任何深度框架中。利用物理学先验知识指导扩散过程,PDT分支可以获得水下意识能力,并能够建模现实世界水下场景的复杂分布。大量实验证明,我们的方法在UIE任务上实现了最佳性能。
https://arxiv.org/abs/2403.01497
Prior-based methods for low-light image enhancement often face challenges in extracting available prior information from dim images. To overcome this limitation, we introduce a simple yet effective Retinex model with the proposed edge extraction prior. More specifically, we design an edge extraction network to capture the fine edge features from the low-light image directly. Building upon the Retinex theory, we decompose the low-light image into its illumination and reflectance components and introduce an edge-guided Retinex model for enhancing low-light images. To solve the proposed model, we propose a novel inertial Bregman alternating linearized minimization algorithm. This algorithm addresses the optimization problem associated with the edge-guided Retinex model, enabling effective enhancement of low-light images. Through rigorous theoretical analysis, we establish the convergence properties of the algorithm. Besides, we prove that the proposed algorithm converges to a stationary point of the problem through nonconvex optimization theory. Furthermore, extensive experiments are conducted on multiple real-world low-light image datasets to demonstrate the efficiency and superiority of the proposed scheme.
基于先验的方法在低光图像增强中通常面临着从低光图像中提取可用先验信息的有挑战性。为了克服这一局限性,我们引入了一个简单而有效的Retinex模型,该模型基于提出的边缘提取先验。具体来说,我们设计了一个边缘提取网络,可以直接从低光图像中提取细边缘特征。通过Retinex理论,我们将低光图像分解为其光照和反射分量,并引入了一个边缘引导的Retinex模型,用于增强低光图像。为了解决所提出的模型,我们提出了一种新颖的惯性Bregman交替线性最小二乘算法。这个算法解决了与边缘引导的Retinex模型相关的优化问题,有效提高了低光图像。通过严谨的数学分析,我们证明了该算法的收敛性。此外,我们通过非凸优化理论证明,该算法通过非凸优化理论到达了问题的稳定点。最后,我们在多个真实世界低光图像数据集上进行了广泛的实验,证明了所提出方案的有效性和优越性。
https://arxiv.org/abs/2403.01142
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments. However, creating precisely aligned paired images presents significant challenges and hinders the advancement of methods trained on such data. To overcome this challenge, this paper introduces a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Specifically, we transform image features into the frequency domain using Discrete Fourier Transformation (DFT). Subsequently, frequency components (amplitude and phase) are processed separately to form the FDL loss function. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain. Extensive experimental evaluations, focusing on image enhancement and super-resolution tasks, demonstrate that FDL outperforms existing misalignment-robust loss functions. Furthermore, we explore the potential of our FDL for image style transfer that relies solely on completely misaligned data. Our code is available at: this https URL
本文旨在解决深度学习图像变换方法中一个普遍的挑战,例如图像增强和超分辨率,这些方法高度依赖像素级对齐的配对数据。然而,精确对齐配对图像存在重大挑战,阻碍了基于此类数据训练的方法的进步。为解决这个问题,本文引入了一种新颖且简单的频谱分布损失(FDL)用于计算频域中的分布距离。具体来说,我们使用离散傅里叶变换(DFT)将图像特征转换到频域。然后,对频谱成分(幅度和相位)分别进行处理,形成FDL损失函数。我们的方法通过在频域中明智地利用全局信息,作为训练约束,已被实验证明在图像增强和超分辨率任务中具有优越性能。此外,我们还探讨了我们的FDL在仅依赖完全错配数据进行图像风格转移方面的潜力。我们的代码可在此处下载:https://this URL。
https://arxiv.org/abs/2402.18192
Human vision relies heavily on available ambient light to perceive objects. Low-light scenes pose two distinct challenges: information loss due to insufficient illumination and undesirable brightness shifts. Low-light image enhancement (LLIE) refers to image enhancement technology tailored to handle this scenario. We introduce CPGA-Net, an innovative LLIE network that combines dark/bright channel priors and gamma correction via deep learning and integrates features inspired by the Atmospheric Scattering Model and the Retinex Theory. This approach combines the use of traditional and deep learning methodologies, designed within a simple yet efficient architectural framework that focuses on essential feature extraction. The resulting CPGA-Net is a lightweight network with only 0.025 million parameters and 0.030 seconds for inference time, yet it achieves superior performance over existing LLIE methods on both objective and subjective evaluation criteria. Furthermore, we utilized knowledge distillation with explainable factors and proposed an efficient version that achieves 0.018 million parameters and 0.006 seconds for inference time. The proposed approaches inject new solution ideas into LLIE, providing practical applications in challenging low-light scenarios.
人类视觉很大程度上依赖于可用的环境光来感知物体。低光场景面临着两个明确的挑战:由于光线不足而导致的信息丢失和不良的亮度漂移。低光图像增强(LLIE)是指针对这种情况定制的图像增强技术。我们介绍了一种创新性的LLIE网络CPGA-Net,它结合了暗/亮通道 prior 和伽马校正,通过深度学习和整合了来自大气散射模型和Retinex理论的特征。这种方法将传统和深度学习方法论相结合,在简单的架构框架内关注关键特征的提取。所得到的CPGA-Net是一个轻量级的网络,具有0.025百万参数和0.030秒的推理时间,然而,它在外观和主观评价标准上优于现有的LLIE方法。此外,我们还利用知识蒸馏与可解释因素,并提出了一种高效的版本,具有0.018百万参数和0.006秒的推理时间。这些方法为LLIE提供了新的解决方案,为具有挑战性的低光场景提供了实际应用。
https://arxiv.org/abs/2402.18147
Image enhancement algorithms are very useful for real world computer vision tasks where image resolution is often physically limited by the sensor size. While state-of-the-art deep neural networks show impressive results for image enhancement, they often struggle to enhance real-world images. In this work, we tackle a real-world setting: inpainting of images from Dunhuang caves. The Dunhuang dataset consists of murals, half of which suffer from corrosion and aging. These murals feature a range of rich content, such as Buddha statues, bodhisattvas, sponsors, architecture, dance, music, and decorative patterns designed by different artists spanning ten centuries, which makes manual restoration challenging. We modify two different existing methods (CAR, HINet) that are based upon state-of-the-art (SOTA) super resolution and deblurring networks. We show that those can successfully inpaint and enhance these deteriorated cave paintings. We further show that a novel combination of CAR and HINet, resulting in our proposed inpainting network (ARIN), is very robust to external noise, especially Gaussian noise. To this end, we present a quantitative and qualitative comparison of our proposed approach with existing SOTA networks and winners of the Dunhuang challenge. One of the proposed methods HINet) represents the new state of the art and outperforms the 1st place of the Dunhuang Challenge, while our combination ARIN, which is robust to noise, is comparable to the 1st place. We also present and discuss qualitative results showing the impact of our method for inpainting on Dunhuang cave images.
图像增强算法在现实世界的计算机视觉任务中非常有用,因为传感器大小通常受到物理限制。尽管最先进的深度神经网络在图像增强方面表现出惊人的效果,但他们通常很难增强现实世界的图像。在这项工作中,我们解决了现实世界的一个设置:敦煌洞穴中的图像修复。敦煌数据集包括壁画,其中一半受到腐蚀和衰老的影响。这些壁画具有丰富的内容,如佛教雕像、菩萨、资助人、建筑、舞蹈、音乐和由不同艺术家跨越十个世纪设计的装饰图案,这使得手动修复具有挑战性。我们修改了两种基于最先进(SOTA)超分辨率和支持网络的方法(CAR,HINet)。我们证明了这些方法可以成功修复和增强这些腐蚀的洞穴壁画。我们还进一步证明了我们提出的修复网络(ARIN)对 external noise(特别是高斯噪声)非常鲁棒。因此,我们与现有SOTA网络和敦煌挑战赛获胜者进行了定量和定性比较。我们提出的方法(HINet)代表着最新的技术,超越了敦煌挑战赛的第一名,而我们的组合ARIN,具有很强的鲁棒性,与第一名的水平相当。我们还展示了和讨论了关于我们修复方法对敦煌洞穴图像影响的定性结果。
https://arxiv.org/abs/2402.16188
This paper presents, for the first time, an image enhancement methodology designed to enhance the clarity of small intestinal villi in Wireless Capsule Endoscopy (WCE) images. This method first separates the low-frequency and high-frequency components of small intestinal villi images using guided filtering. Subsequently, an adaptive light gain factor is generated based on the low-frequency component, and an adaptive gradient gain factor is derived from the convolution results of the Laplacian operator in different regions of small intestinal villi images. The obtained light gain factor and gradient gain factor are then combined to enhance the high-frequency components. Finally, the enhanced high-frequency component is fused with the original image to achieve adaptive sharpening of the edges of WCE small intestinal villi images. The experiments affirm that, compared to established WCE image enhancement methods, our approach not only accentuates the edge details of WCE small intestine villi images but also skillfully suppresses noise amplification, thereby preventing the occurrence of edge overshooting.
本文首次提出了一种设计用于增强无线胶囊内镜(WCE)图像中小肠绒毛清晰度的图像增强方法。该方法首先使用指导滤波将小肠绒毛图像的低频和高频成分分离。接着,根据低频成分生成自适应光增益因子,并从小肠绒毛图像不同区域处Laplacian操作的卷积结果中得到自适应梯度增益因子。所获得的光增益因子和梯度增益因子 then 被结合以增强高频成分。最后,增强的高频成分与原始图像融合,以实现WCE小肠绒毛图像边缘的自适应锐化。实验结果表明,与已有的WCE图像增强方法相比,我们的方法不仅强调了WCE小肠绒毛图像边缘的细节,还巧妙地抑制了噪声放大,从而防止了边缘过曝的情况发生。
https://arxiv.org/abs/2402.15977
In this work, we observe that the generators, which are pre-trained on massive natural images, inherently hold the promising potential for superior low-light image enhancement against varying scenarios.Specifically, we embed a pre-trained generator to Retinex model to produce reflectance maps with enhanced detail and vividness, thereby recovering features degraded by low-light conditions.Taking one step further, we introduce a novel optimization strategy, which backpropagates the gradients to the input seeds rather than the parameters of the low-light enhancement model, thus intactly retaining the generative knowledge learned from natural images and achieving faster convergence speed. Benefiting from the pre-trained knowledge and seed-optimization strategy, the low-light enhancement model can significantly regularize the realness and fidelity of the enhanced result, thus rapidly generating high-quality images without training on any low-light dataset. Extensive experiments on various benchmarks demonstrate the superiority of the proposed method over numerous state-of-the-art methods qualitatively and quantitatively.
在这项工作中,我们观察到,预训练在大型自然图像上的生成器本身具有在各种场景下提高低光图像增强的潜力。具体来说,我们将预训练的生成器嵌入Retinex模型中,产生具有增强细节和生动度的反射率图,从而恢复低光条件下降解的特征。更进一步,我们引入了一种新的优化策略,即反向传播梯度到输入种子而不是低光增强模型的参数,从而保留从自然图像中学习到的生成知识,并实现更快的收敛速度。利用预训练知识和种子优化策略,低光增强模型可以显著使增强结果的真诚性和可靠性规范化,从而在无需训练于任何低光数据集的情况下快速生成高质量图像。在各种基准测试上进行的大量实验证明,与许多最先进的 methods相比,所提出的方法在质量和数量上具有优越性。
https://arxiv.org/abs/2402.09694
Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-noise ratio can introduce sensitivity and instability into the enhancement process. Consequently, this results in the presence of color artifacts and brightness artifacts in the enhanced images. To alleviate this problem, we propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI). It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters. Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space. Within CIDNet, we introduce the Lightweight Cross-Attention (LCA) module to facilitate interaction between image structure and content information in both branches, while also suppressing noise in low-light images. Finally, we conducted 22 quantitative and qualitative experiments to show that the proposed CIDNet outperforms the state-of-the-art methods on 11 datasets. The code will be available at this https URL.
低光图像增强(LLIE)任务通常会恢复出损坏的低光图像中的细节和视觉信息。现有方法通过在sRGB和HSV色彩空间上训练深度神经网络(DNNs)来学习低/正常光图像之间的映射函数。然而,增强涉及放大图像信号,并将这些色彩空间应用于低光图像时,可能会引入对增强过程的灵敏度和不稳定性。因此,在增强后的图像中存在色彩伪影和亮度伪影。为了减轻这个问题,我们提出了一个名为Horizontal/Vertical-Intensity(HVI)的新可训练颜色空间。它不仅将亮度和颜色与RGB通道分离,以减轻增强过程中的不稳定性,而且还由于可训练参数适应不同光照范围而适应低光图像。此外,我们还设计了一个新的Color and Intensity Decoupling Network(CIDNet)有两个分支专门处理HVI空间中解耦的图像亮度和颜色。在CIDNet中,我们在两个分支中引入了轻量级跨注意(LCA)模块,以促进图像结构和内容信息的交互,同时抑制低光图像中的噪声。最后,我们进行了22个定量和定性实验,证明了与最先进的方法相比,所提出的CIDNet在11个数据集上取得了优异的性能。代码将在此处https URL上提供。
https://arxiv.org/abs/2402.05809
Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at this https URL.
低光图像增强(LLIE)恢复了过曝光图像的色彩和亮度。监督方法在收集低/正常光图像对方面具有很高的成本。无监督方法在构建复杂损失函数方面投入了大量精力。我们通过提出的TroubleMaker学习(TML)策略来解决这两个挑战,该策略使用正常光图像作为训练输入。TML很简单:我们先将输入暗化,然后提高其亮度。TML基于两个核心组件。首先,麻烦制造者模型(TM)从正常图像构建伪低光图像来减轻成对数据的开销。其次,预测模型(PM)增强伪低光图像的亮度。此外,我们还引入增强模型(EM)以进一步提高PM输出 visually。在LLIE任务中,描述全局元素关联对性能至关重要,因为可以捕捉到相同对象更多的信息。CNN无法做到这一点,而自注意力具有高时间复杂性。因此,我们提出了具有O(n)时间复杂度的全局动态卷积(GDC)模块,本质上模仿了自注意力的部分计算过程来表示元素间的关联。基于GDC模块,我们构建了UGDC模型。大量的定量实验和定性实验证明,使用TML训练的UGDC模型可以在公共数据集上与最先进的解决方案竞争。代码可在此处访问:https://www.acm.org/dl/doi/10.1145/2888206.2888316
https://arxiv.org/abs/2402.04584
Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that distinguish text from general objects. Effectively leveraging these unique textual characteristics is crucial in visual text processing, as observed in our study. In this survey, we present a comprehensive, multi-perspective analysis of recent advancements in this field. Initially, we introduce a hierarchical taxonomy encompassing areas ranging from text image enhancement and restoration to text image manipulation, followed by different learning paradigms. Subsequently, we conduct an in-depth discussion of how specific textual features such as structure, stroke, semantics, style, and spatial context are seamlessly integrated into various tasks. Furthermore, we explore available public datasets and benchmark the reviewed methods on several widely-used datasets. Finally, we identify principal challenges and potential avenues for future research. Our aim is to establish this survey as a fundamental resource, fostering continued exploration and innovation in the dynamic area of visual text processing.
视觉文本是文档和场景图像中的关键元素,在计算机视觉领域引起了巨大的关注。视觉文本检测和识别之外,该领域的研究活动近年来迅速增长,得益于基本生成模型的出现。然而,由于文本与普通对象的独特性质和特征,该领域仍然面临着挑战。有效利用这些独特的文本特征对视觉文本处理至关重要,正如我们研究中所观察到的。 在本调查中,我们全面、多角度地分析了该领域近期的进展。首先,我们引入了一个包含从文本图像增强和修复到文本图像操作的层次结构分类器。接着,我们深入讨论了如何将特定的文本特征(如结构、笔画、语义、风格和空间语境)轻松地整合到各种任务中。此外,我们还探讨了可用的公开数据集,并在多个广泛使用数据集上对所评估的方法进行了比较。最后,我们找出了主要挑战和未来的研究方向。我们的目标是将本调查作为这个动态领域的基石,促进在这个领域的持续探索和创新。
https://arxiv.org/abs/2402.03082
Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: this https URL
图像修复是一个涉及从其劣质观察中恢复高质量干净图像的基本问题。全功能图像修复模型可以使用特定的退化信息作为提示,以指导修复模型从各种类型和退化水平修复图像。在本文中,我们提出了第一种使用人类编写指令指导图像修复模型的方法。通过自然语言提示,我们的模型可以从其退化版本的图像中恢复高质量图像,考虑多种退化类型。我们的方法InstructIR在包括图像去噪、去雨、去雾、去烟和(低光)图像增强等几个修复任务中实现了最先进的成果。InstructIR比之前的所有全功能修复方法+1dB。此外,我们的数据集和结果代表了一种新的研究文本指导图像修复和增强的基准。我们的代码、数据集和模型可在此处访问:https:// this URL
https://arxiv.org/abs/2401.16468
Underwater image enhancement (UIE) is challenging since image degradation in aquatic environments is complicated and changing over time. Existing mainstream methods rely on either physical-model or data-driven, suffering from performance bottlenecks due to changes in imaging conditions or training instability. In this article, we make the first attempt to adapt the diffusion model to the UIE task and propose a Content-Preserving Diffusion Model (CPDM) to address the above challenges. CPDM first leverages a diffusion model as its fundamental model for stable training and then designs a content-preserving framework to deal with changes in imaging conditions. Specifically, we construct a conditional input module by adopting both the raw image and the difference between the raw and noisy images as the input, which can enhance the model's adaptability by considering the changes involving the raw images in underwater environments. To preserve the essential content of the raw images, we construct a content compensation module for content-aware training by extracting low-level features from the raw images. Extensive experimental results validate the effectiveness of our CPDM, surpassing the state-of-the-art methods in terms of both subjective and objective metrics.
水下图像增强(UIE)具有挑战性,因为水下环境中的图像退化问题复杂且随时间变化。现有的主流方法依赖于物理模型或数据驱动,由于成像条件或训练不稳定而性能瓶颈。在本文中,我们尝试将扩散模型适应UIE任务,并提出了一种内容保留扩散模型(CPDM)来解决上述挑战。CPDM首先将扩散模型作为基本模型进行稳定训练,然后设计了一个内容保留框架来处理影像条件的改变。具体来说,我们通过将原始图像和原始图像与噪声图像之间的差异作为输入,构建了一个条件输入模块,可以增强模型在水中环境中的原始图像的适应性,考虑原始图像在水中环境中的变化。为了保留原始图像的 essential content,我们通过从原始图像中提取低级特征来构建 content-aware 训练的内容补偿模块。大量的实验结果证实了我们的 CPDM 的有效性,在主观和客观指标上超过了目前最先进的方法。
https://arxiv.org/abs/2401.15649