In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization.
在本文中,我们提出了一种神经网络预设置技术,以解决现有颜色风格传输方法的限制,包括视觉偏差、巨大的内存要求和缓慢的风格切换速度。我们的技术基于两个核心设计。首先,我们提议使用无监督神经网络颜色映射(DNCM),通过图像自适应的颜色映射矩阵,对每个像素进行连续的操作,避免偏差并支持具有较小内存 footprint的高分辨率输入。其次,我们开发了一道两阶段的 pipeline,将任务分为颜色正常化和风格化,以便通过提取颜色风格作为预设置,并在正常化输入图像中重用它们,实现高效的风格切换。由于pairwise dataset 不存在,我们描述了如何通过自监督策略训练神经网络预设置。通过全面评估,我们展示了神经网络预设置比现有方法的各种优点。此外,我们展示,我们的训练模型自然地支持多个应用程序,包括暗光图像增强、水下图像修复、图像去雾和图像协调。
https://arxiv.org/abs/2303.13511
Images taken under low-light conditions tend to suffer from poor visibility, which can decrease image quality and even reduce the performance of the downstream tasks. It is hard for a CNN-based method to learn generalized features that can recover normal images from the ones under various unknow low-light conditions. In this paper, we propose to incorporate the contrastive learning into an illumination correction network to learn abstract representations to distinguish various low-light conditions in the representation space, with the purpose of enhancing the generalizability of the network. Considering that light conditions can change the frequency components of the images, the representations are learned and compared in both spatial and frequency domains to make full advantage of the contrastive learning. The proposed method is evaluated on LOL and LOL-V2 datasets, the results show that the proposed method achieves better qualitative and quantitative results compared with other state-of-the-arts.
在低光环境下拍摄的图像往往会出现视野不佳的情况,这可能会降低图像质量,甚至影响后续任务的表现。基于卷积神经网络的方法很难学习通用的特征,以便从各种未知低光环境下恢复正常图像。在本文中,我们提出将对比学习融入照明纠正网络中,学习抽象表示来在表示空间中区分各种低光条件,以增强网络的泛化能力。考虑到光照条件可以改变图像的频谱成分,我们将在空间域和频域中学习表示并进行比较,以充分利用对比学习的优势。我们使用LOL和LOL-V2数据集评估了所选方法,结果表明,与其他先进技术相比,该方法取得了更好的质量和定量结果。
https://arxiv.org/abs/2303.13412
How to generate the ground-truth (GT) image is a critical issue for training realistic image super-resolution (Real-ISR) models. Existing methods mostly take a set of high-resolution (HR) images as GTs and apply various degradations to simulate their low-resolution (LR) counterparts. Though great progress has been achieved, such an LR-HR pair generation scheme has several limitations. First, the perceptual quality of HR images may not be high enough, limiting the quality of Real-ISR outputs. Second, existing schemes do not consider much human perception in GT generation, and the trained models tend to produce over-smoothed results or unpleasant artifacts. With the above considerations, we propose a human guided GT generation scheme. We first elaborately train multiple image enhancement models to improve the perceptual quality of HR images, and enable one LR image having multiple HR counterparts. Human subjects are then involved to annotate the high quality regions among the enhanced HR images as GTs, and label the regions with unpleasant artifacts as negative samples. A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models. Experiments show that the Real-ISR models trained on our dataset can produce perceptually more realistic results with less artifacts. Dataset and codes can be found at this https URL
生成真相图像(GT)是训练真实图像超分辨率(Real-ISR)模型的一个关键问题。现有的方法大多将高分辨率(HR)图像作为真相图像,并应用各种退化来模拟其低分辨率(LR)对应物。尽管取得了很大进展,但这种LR-HR对偶生成方法有几个限制。首先,HR图像的感知质量可能不够高,限制Real-ISR输出质量。其次,现有的生成方法并未充分考虑人类感知在真相图像生成中的作用,训练模型往往产生过度平滑的结果或不愉快的痕迹。基于以上考虑,我们提出了一种人类引导的真相图像生成方法。我们首先 elaborately 训练多个图像增强模型,以提高HR图像的感知质量,并使一个LR图像具有多个HR对应物。人类 subjects then 参与标注增强后的HR图像中的高质量区域,并将其作为真相图像,并将不愉快的痕迹区域作为负样本。一个包含正负样本的人类引导的真相图像数据集随后构建,并提出了损失函数来训练Real-ISR模型。实验表明,在我们数据集中训练的Real-ISR模型可以产生感知上更为真实,减少痕迹的结果。数据集和代码可在此https URL中找到。
https://arxiv.org/abs/2303.13069
The following three factors restrict the application of existing low-light image enhancement methods: unpredictable brightness degradation and noise, inherent gap between metric-favorable and visual-friendly versions, and the limited paired training data. To address these limitations, we propose an implicit Neural Representation method for Cooperative low-light image enhancement, dubbed NeRCo. It robustly recovers perceptual-friendly results in an unsupervised manner. Concretely, NeRCo unifies the diverse degradation factors of real-world scenes with a controllable fitting function, leading to better robustness. In addition, for the output results, we introduce semantic-orientated supervision with priors from the pre-trained vision-language model. Instead of merely following reference images, it encourages results to meet subjective expectations, finding more visual-friendly solutions. Further, to ease the reliance on paired data and reduce solution space, we develop a dual-closed-loop constrained enhancement module. It is trained cooperatively with other affiliated modules in a self-supervised manner. Finally, extensive experiments demonstrate the robustness and superior effectiveness of our proposed NeRCo. Our code is available at this https URL.
以下是三个因素限制了现有低光图像增强方法的应用:不可预测的亮度下降和噪声,metrics favorable和视觉友好版本的固有差异,以及有限的配对训练数据。为了解决这些问题,我们提出了一种隐含的神经网络表示方法,称为NeRCo,它以无监督的方式 robustly 恢复认知友好的结果。具体来说,NeRCo将真实场景的不同退化因素与可控制适应函数相结合,导致更好的鲁棒性。此外,对于输出结果,我们引入了语义导向的监督,从预训练的视觉语言模型中获取先验。相反,它不再仅仅跟随参考图像,而是鼓励结果满足主观期望,找到更多的视觉友好解决方案。进一步,为了减轻依赖配对数据并减少解决方案空间,我们开发了双重闭环限制增强模块。它与其他相关模块合作训练自我监督。最后,广泛的实验证明了我们提出的NeRCo的鲁棒性和优越性能。我们的代码可用在这个httpsURL上。
https://arxiv.org/abs/2303.11722
Low-light image enhancement (LLIE) techniques attempt to increase the visibility of images captured in low-light scenarios. However, as a result of enhancement, a variety of image degradations such as noise and color bias are revealed. Furthermore, each particular LLIE approach may introduce a different form of flaw within its enhanced results. To combat these image degradations, post-processing denoisers have widely been used, which often yield oversmoothed results lacking detail. We propose using a diffusion model as a post-processing approach, and we introduce Low-light Post-processing Diffusion Model (LPDM) in order to model the conditional distribution between under-exposed and normally-exposed images. We apply LPDM in a manner which avoids the computationally expensive generative reverse process of typical diffusion models, and post-process images in one pass through LPDM. Extensive experiments demonstrate that our approach outperforms competing post-processing denoisers by increasing the perceptual quality of enhanced low-light images on a variety of challenging low-light datasets. Source code is available at this https URL.
暗光图像增强(LLIE)技术试图在暗光场景中增加图像的可见性。但是,通过增强,会揭示出多种图像退化,例如噪声和色彩偏差。此外,每个特定的Llie方法可能会在其增强结果中引入不同形式的缺陷。为了对抗这些图像退化, widely 采用了后处理滤波器,往往产生过于平滑的结果,缺乏细节。我们建议使用扩散模型作为后处理方法,并引入暗光后处理扩散模型(LPDM),以建模 under-Exposed 和正常曝光图像的条件分布。我们采用 LPDM 的方式避免典型的扩散模型计算代价昂贵的生成逆过程,并在一次处理中通过 LPDM 后处理图像。广泛的实验表明,我们的方法在提高增强的暗光图像在不同挑战性的暗光数据集上的感知质量方面优于竞争后处理滤波器。源代码可在该 https URL 可用。
https://arxiv.org/abs/2303.09627
With the improvement of sensor technology and significant algorithmic advances, the accuracy of remote heart rate monitoring technology has been significantly improved. Despite of the significant algorithmic advances, the performance of rPPG algorithm can degrade in the long-term, high-intensity continuous work occurred in evenings or insufficient light environments. One of the main challenges is that the lost facial details and low contrast cause the failure of detection and tracking. Also, insufficient lighting in video capturing hurts the quality of physiological signal. In this paper, we collect a large-scale dataset that was designed for remote heart rate estimation recorded with various illumination variations to evaluate the performance of the rPPG algorithm (Green, ICA, and POS). We also propose a low-light enhancement solution (technical solution) for remote heart rate estimation under the low-light condition. Using collected dataset, we found 1) face detection algorithm cannot detect faces in video captured in low light conditions; 2) A decrease in the amplitude of the pulsatile signal will lead to the noise signal to be in the dominant position; and 3) the chrominance-based method suffers from the limitation in the assumption about skin-tone will not hold, and Green and ICA method receive less influence than POS in dark illuminance environment. The proposed solution for rPPG process is effective to detect and improve the signal-to-noise ratio and precision of the pulsatile signal.
随着传感器技术和算法的重大进展,远程心率监测技术的精度已经得到了显著提高。尽管出现了重大算法进展,但RPG算法的长期性能可能会下降,尤其是在傍晚或光线不足的环境下进行高强度连续工作。其中一个重要的挑战是,失去面部细节和低对比度会导致检测和跟踪失败。此外,视频捕获中的照明不足会影响生理信号的质量。在本文中,我们收集了一个大规模的数据集,该数据集设计用于远程心率估计,并采用各种照明变化来评估RPG算法的性能。我们还提出了在低光照条件下的远程心率估计技术的技术解决方案。通过收集的数据集,我们发现1) 面部检测算法在低光照条件下无法检测视频中的面部;2) 脉冲信号的振幅减少会导致噪声信号处于主导地位;3)基于颜色映射的方法受到肤色假设的限制,无法保持稳定,而Green和ICA方法在黑暗照明环境下的影响力比POS低。RPG算法提出的解决方案能够有效地检测和改善脉冲信号的信号到噪声比率和精度。
https://arxiv.org/abs/2303.09336
An image processing unit (IPU), or image signal processor (ISP) for high dynamic range (HDR) imaging usually consists of demosaicing, white balancing, lens shading correction, color correction, denoising, and tone-mapping. Besides noise from the imaging sensors, almost every step in the ISP introduces or amplifies noise in different ways, and denoising operators are designed to reduce the noise from these sources. Designed for dynamic range compressing, tone-mapping operators in an ISP can significantly amplify the noise level, especially for images captured in low-light conditions, making denoising very difficult. Therefore, we propose a joint multi-scale denoising and tone-mapping framework that is designed with both operations in mind for HDR images. Our joint network is trained in an end-to-end format that optimizes both operators together, to prevent the tone-mapping operator from overwhelming the denoising operator. Our model outperforms existing HDR denoising and tone-mapping operators both quantitatively and qualitatively on most of our benchmarking datasets.
图像处理单元(IPU)或高动态范围(HDR)图像传感器通常包括 demosaicing、白平衡、镜头遮光纠正、色彩校正、去噪和色彩映射等步骤。除了图像传感器产生的噪声,ISP几乎在每个步骤中以不同的方式引入了或增加了噪声,去噪操作的设计目的是减少这些来源产生的噪声。为了压缩动态范围,ISP中的色彩映射操作可能会显著增加噪声水平,特别是对于那些在昏暗条件下拍摄的图像,这使得去噪非常困难。因此,我们提出了一种 joint multi-scale denoising and tone-mapping framework,该框架专为HDR图像设计,考虑了两种操作。我们的联合网络采用 end-to-end 格式进行训练,优化了两个操作,防止色彩映射操作压倒去噪操作。我们的模型在我们的主要基准数据集上比现有的HDR去噪和色彩映射操作表现更好,既定量上也定性地表现出色。
https://arxiv.org/abs/2303.09071
When enhancing low-light images, many deep learning algorithms are based on the Retinex theory. However, the Retinex model does not consider the corruptions hidden in the dark or introduced by the light-up process. Besides, these methods usually require a tedious multi-stage training pipeline and rely on convolutional neural networks, showing limitations in capturing long-range dependencies. In this paper, we formulate a simple yet principled One-stage Retinex-based Framework (ORF). ORF first estimates the illumination information to light up the low-light image and then restores the corruption to produce the enhanced image. We design an Illumination-Guided Transformer (IGT) that utilizes illumination representations to direct the modeling of non-local interactions of regions with different lighting conditions. By plugging IGT into ORF, we obtain our algorithm, Retinexformer. Comprehensive quantitative and qualitative experiments demonstrate that our Retinexformer significantly outperforms state-of-the-art methods on seven benchmarks. The user study and application on low-light object detection also reveal the latent practical values of our method. Codes and pre-trained models will be released.
在增强黑暗图像时,许多深度学习算法基于Retinex理论。然而,Retinex模型并未考虑黑暗区域或照明过程引入的 corruptions。此外,这些方法通常需要进行繁琐的多级训练流程,并依赖卷积神经网络,表明它们在捕捉长期依赖方面存在限制。在本文中,我们提出了一种简单但原则性的 One-stage Retinex-based Framework (ORF)。ORF先估计照明信息来点亮黑暗图像,然后恢复 corruptions 以产生增强图像。我们设计了一个照明引导的卷积神经网络(IGT),利用照明表示指导不同照明条件下区域的非局部相互作用建模。通过将 IGT 插入到ORF中,我们得到了我们的算法 Retinexformer。全面量化和定性实验表明,我们的 Retinexformer 在七个基准问题上 significantly outperforms 最先进的方法。黑暗物体检测用户研究和应用也揭示了我们方法的潜在实用价值。代码和预训练模型将发布。
https://arxiv.org/abs/2303.06705
The challenges in recovering underwater images are the presence of diverse degradation factors and the lack of ground truth images. Although synthetic underwater image pairs can be used to overcome the problem of inadequately observing data, it may result in over-fitting and enhancement degradation. This paper proposes a model-based deep learning method for restoring clean images under various underwater scenarios, which exhibits good interpretability and generalization ability. More specifically, we build up a multi-variable convolutional neural network model to estimate the clean image, background light and transmission map, respectively. An efficient loss function is also designed to closely integrate the variables based on the underwater image model. The meta-learning strategy is used to obtain a pre-trained model on the synthetic underwater dataset, which contains different types of degradation to cover the various underwater environments. The pre-trained model is then fine-tuned on real underwater datasets to obtain a reliable underwater image enhancement model, called MetaUE. Numerical experiments demonstrate that the pre-trained model has good generalization ability, allowing it to remove the color degradation for various underwater attenuation images such as blue, green and yellow, etc. The fine-tuning makes the model able to adapt to different underwater datasets, the enhancement results of which outperform the state-of-the-art underwater image restoration methods. All our codes and data are available at \url{this https URL}.
恢复水下图像面临的挑战包括存在多种退化因素和缺乏准确的照片。虽然合成的水下图像对解决数据观察不足的问题有所帮助,但可能会导致过度拟合和增强退化。本文提出了基于模型的深度学习方法,用于在不同水下场景下恢复清洁的图像,表现出良好的解释性和泛化能力。具体来说,我们构建了一个多变量卷积神经网络模型,用于估计清洁图像、背景光和传输地图。高效的损失函数也被设计为紧密集成基于水下图像模型的变量。使用元学习策略,我们获取了在合成水下数据集上的预训练模型,该数据集包含不同类型的退化,以涵盖各种水下环境。预训练模型随后在真实水下数据集上进行微调,以获得一种可靠的水下图像增强模型,称为MetaUE。数值实验表明,预训练模型具有良好的泛化能力,使其能够去除各种水下衰减图像(如蓝色、绿色和黄色)的颜色退化。微调使模型能够适应不同的水下数据集,增强结果超过了最先进的水下图像恢复方法。我们的代码和数据都在 url{this https URL} 上可用。
https://arxiv.org/abs/2303.06543
The quality of a fundus image can be compromised by numerous factors, many of which are challenging to be appropriately and mathematically modeled. In this paper, we introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED), for enhancing fundus images. Specifically, we first adopt a data-driven degradation framework to learn degradation mappings from unpaired high-quality to low-quality images. We then apply a conditional diffusion model to learn the inverse enhancement process in a paired manner. The proposed LED is able to output enhancement results that maintain clinically important features with better clarity. Moreover, in the inference phase, LED can be easily and effectively integrated with any existing fundus image enhancement framework. We evaluate the proposed LED on several downstream tasks with respect to various clinically-relevant metrics, successfully demonstrating its superiority over existing state-of-the-art methods both quantitatively and qualitatively. The source code is available at this https URL.
一张眼睛图像的质量可能受到许多因素的影响,其中许多挑战难以数学建模。在本文中,我们介绍了一种基于扩散模型的新框架,名为“学习退化从增强”(LED),用于增强眼睛图像。具体而言,我们采用一种数据驱动的退化框架,学习从一对高质量的低质量图像中的退化映射。随后,我们应用条件扩散模型,以一对的方式学习逆增强过程。我们提出的LED能够输出保持临床重要特征更加清晰的增强结果。此外,在推理阶段,LED能够很容易地有效地与其他现有的眼睛图像增强框架进行集成。我们对提出的LED在多个后续任务中与各种临床相关指标的评估,成功地证明了它相对于现有先进技术的优越性,包括定量和定性方面。源代码可在本网站的 https URL 获取。
https://arxiv.org/abs/2303.04603
Medical imaging plays a significant role in detecting and treating various diseases. However, these images often happen to be of too poor quality, leading to decreased efficiency, extra expenses, and even incorrect diagnoses. Therefore, we propose a retinal image enhancement method using a vision transformer and convolutional neural network. It builds a cycle-consistent generative adversarial network that relies on unpaired datasets. It consists of two generators that translate images from one domain to another (e.g., low- to high-quality and vice versa), playing an adversarial game with two discriminators. Generators produce indistinguishable images for discriminators that predict the original images from generated ones. Generators are a combination of vision transformer (ViT) encoder and convolutional neural network (CNN) decoder. Discriminators include traditional CNN encoders. The resulting improved images have been tested quantitatively using such evaluation metrics as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and qualitatively, i.e., vessel segmentation. The proposed method successfully reduces the adverse effects of blurring, noise, illumination disturbances, and color distortions while significantly preserving structural and color information. Experimental results show the superiority of the proposed method. Our testing PSNR is 31.138 dB for the first and 27.798 dB for the second dataset. Testing SSIM is 0.919 and 0.904, respectively.
医学成像在检测和治疗各种疾病中发挥着重要的作用。然而,这些图像往往质量不佳,导致效率降低、额外支出、甚至错误诊断。因此,我们提出了一种利用视觉转换器和卷积神经网络的图像增强方法。这种方法构建了一个循环一致性的生成对抗网络,依赖于两个独立的数据集。它由两个生成器组成,从一种领域到另一种领域(例如,低质量到高质量,反之亦然),与两个判别器进行对抗游戏。生成器产生判别器无法分辨的图像,以便判别器预测原始图像。生成器由视觉转换器(ViT)编码器和卷积神经网络(CNN)解码器组成。判别器包括传统的CNN编码器。这些方法改进的图像通过使用例如峰值信噪比(PSNR)、结构相似性指数测量(SSIM)和定性评估指标(例如,分割 vessel)进行量化测试。该方法成功地减少了模糊、噪声、照明干扰和色彩失真的不利影响,同时 significantly 保留了结构和颜色信息。实验结果证明了该方法的优越性。我们的测试 PSNR 分别为第一个数据集的 31.138 dB 和第二个数据集的 27.798 dB。测试 SSIM 分别为 0.919 和 0.904。
https://arxiv.org/abs/2303.01939
Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles. To address this issue, we propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images. Specifically, our proposed conditional network is designed with Adaptive Fourier Gating Mechanism (AFGM) and Multiscale Convolutional Attention Module (MCAM) to generate vectors carrying low-frequency background information and high-frequency detail features, which effectively promote the network to model global background information and local texture details. To more precisely correct the color cast and low saturation of the image, we introduce a Three-branch Feature Extraction (TFE) block in the primary net that processes images pixel by pixel to integrate the color information extended by the same channel (R, G, or B). This block consists of three small branches, each of which has its own weights. Extensive experiments demonstrate that our network significantly outperforms over state-of-the-art methods in both visual quality and quantitative metrics.
水下图像通常由于悬停颗粒对光的吸收和散射而导致亮度和结构方面的混合下降。为了解决这个问题,我们提出了一种实时的空间和频率调制网络(RSFDM-Net),用于高效增强水下图像的颜色和细节。具体来说,我们提议的 conditional network 使用自适应傅里叶调度机制(AFGM)和多尺度卷积注意力模块(MCAM)生成vector,携带低频率背景信息和高频率细节特征,从而有效地促进网络模型模拟 global 背景信息和 local 纹理细节。为了更准确地纠正图像的颜色 cast 和低饱和度,我们在主网络中引入了三个分支的特征提取(TFE)块,逐像素处理图像以整合通过同一通道(R、G或B)扩展的颜色信息。这个块由三个小型分支组成,每个分支具有自己的权重。广泛的实验结果表明,我们的网络在视觉质量和量化指标方面显著优于当前最先进的方法。
https://arxiv.org/abs/2302.12186
Ultra-High-Definition (UHD) photo has gradually become the standard configuration in advanced imaging devices. The new standard unveils many issues in existing approaches for low-light image enhancement (LLIE), especially in dealing with the intricate issue of joint luminance enhancement and noise removal while remaining efficient. Unlike existing methods that address the problem in the spatial domain, we propose a new solution, UHDFour, that embeds Fourier transform into a cascaded network. Our approach is motivated by a few unique characteristics in the Fourier domain: 1) most luminance information concentrates on amplitudes while noise is closely related to phases, and 2) a high-resolution image and its low-resolution version share similar amplitude patterns.Through embedding Fourier into our network, the amplitude and phase of a low-light image are separately processed to avoid amplifying noise when enhancing luminance. Besides, UHDFour is scalable to UHD images by implementing amplitude and phase enhancement under the low-resolution regime and then adjusting the high-resolution scale with few computations. We also contribute the first real UHD LLIE dataset, \textbf{UHD-LL}, that contains 2,150 low-noise/normal-clear 4K image pairs with diverse darkness and noise levels captured in different scenarios. With this dataset, we systematically analyze the performance of existing LLIE methods for processing UHD images and demonstrate the advantage of our solution. We believe our new framework, coupled with the dataset, would push the frontier of LLIE towards UHD. The code and dataset are available at this https URL.
超高清(UHD)照片逐渐已经成为高端成像设备的标准配置。这个新标准揭示了现有方法中许多针对低光图像增强(LLIE)的问题,特别是在处理同时增强亮度和消除噪声的精细问题时,而仍然保持高效性方面的问题。与现有的方法在空间域解决问题不同,我们提出了一个新的解决方案,称为UHDFour,它将傅里叶变换嵌入到一个级联网络中。我们的方案受到一些傅里叶域独特的特性的启发:1)大部分亮度信息集中在幅度上,而噪声与相位密切相关;2)高分辨率图像和低分辨率版本具有相似的幅度模式。通过将傅里叶嵌入我们的网络中,我们对低光图像的幅度和相位分别进行处理,以避免在增强亮度时增加噪声。此外,UHDFour可以 scalable to UHD图像,通过在低分辨率模式下实施幅度和相位增强,然后微调高分辨率尺寸,而只需要很少的计算。我们还提供了第一个真实的UHD Llie数据集 extbf{UHD-LL},它包含2,150个低噪声/正常清晰的4K图像对,在不同场景中记录了不同的黑暗度和噪声水平。通过这个数据集,我们可以系统地分析现有LLIE方法处理UHD图像的性能,并展示我们解决方案的优势。我们认为,与我们的数据集结合使用,将推动Llie向UHD领域的前进。代码和数据集可在该httpsURL上获取。
https://arxiv.org/abs/2302.11831
Deep learning based image enhancement models have largely improved the readability of fundus images in order to decrease the uncertainty of clinical observations and the risk of misdiagnosis. However, due to the difficulty of acquiring paired real fundus images at different qualities, most existing methods have to adopt synthetic image pairs as training data. The domain shift between the synthetic and the real images inevitably hinders the generalization of such models on clinical data. In this work, we propose an end-to-end optimized teacher-student framework to simultaneously conduct image enhancement and domain adaptation. The student network uses synthetic pairs for supervised enhancement, and regularizes the enhancement model to reduce domain-shift by enforcing teacher-student prediction consistency on the real fundus images without relying on enhanced ground-truth. Moreover, we also propose a novel multi-stage multi-attention guided enhancement network (MAGE-Net) as the backbones of our teacher and student network. Our MAGE-Net utilizes multi-stage enhancement module and retinal structure preservation module to progressively integrate the multi-scale features and simultaneously preserve the retinal structures for better fundus image quality enhancement. Comprehensive experiments on both real and synthetic datasets demonstrate that our framework outperforms the baseline approaches. Moreover, our method also benefits the downstream clinical tasks.
深度学习为基础的图像增强模型已经在很大程度上改善了 fundus图像的阅读性,以减少临床观察的不确定性和误诊风险。然而,由于获取不同品质 pair 的真实 fundus 图像的困难,大多数现有方法必须采用合成图像对作为训练数据。合成和真实图像之间的域转换不可避免地阻碍了这种模型在临床数据上的泛化。在本工作时,我们提出了一种端到端优化的学生和老师框架,以同时进行图像增强和域适应。学生网络使用合成图像对进行监督增强,并 regularize 增强模型以减少域转换,通过在没有增强的真实图像上进行学生预测一致性的情况下强制一致性,以避免域转换。此外,我们还提出了一种 novel 的多级多注意力引导增强网络(MAGE-Net),作为我们的学生和老师网络的骨架。我们的 MAGE-Net 使用多级增强器和 retinal 结构保留模块,逐步集成多尺度特征,同时保护 retinal 结构,以改善真实 fundus 图像质量的增强。在真实和合成数据集上的全面实验表明,我们的框架优于基准方法。此外,我们的方法还有助于后续临床任务。
https://arxiv.org/abs/2302.11795
Underwater image enhancement (UIE) is vital for high-level vision-related underwater tasks. Although learning-based UIE methods have made remarkable achievements in recent years, it's still challenging for them to consistently deal with various underwater conditions, which could be caused by: 1) the use of the simplified atmospheric image formation model in UIE may result in severe errors; 2) the network trained solely with synthetic images might have difficulty in generalizing well to real underwater images. In this work, we, for the first time, propose a framework \textit{SyreaNet} for UIE that integrates both synthetic and real data under the guidance of the revised underwater image formation model and novel domain adaptation (DA) strategies. First, an underwater image synthesis module based on the revised model is proposed. Then, a physically guided disentangled network is designed to predict the clear images by combining both synthetic and real underwater images. The intra- and inter-domain gaps are abridged by fully exchanging the domain knowledge. Extensive experiments demonstrate the superiority of our framework over other state-of-the-art (SOTA) learning-based UIE methods qualitatively and quantitatively. The code and dataset are publicly available at this https URL.
水下图像增强(UIE)对于高层次视觉相关水下任务至关重要。尽管基于学习的UIE方法在近年来取得了显著成就,但它们仍然很难应对各种水下条件,这可能由以下原因之一:1)在UIE中使用简化的大气图像形成模型可能会产生严重错误;2)仅使用合成图像训练的网络可能很难将正确扩展到真实的水下图像。在本研究中,我们首次提出了基于修订的水下图像形成模型和 novel 域适应策略的UIE框架 extit{SyreaNet},并整合了合成和真实的数据。首先,我们提出了基于修订模型的水下图像合成模块。然后,通过物理引导分离网络,设计了一个可以结合合成和真实的水下图像来预测清晰图像的网络。内域和跨域之间的差距通过完全交换域知识被弥补。广泛的实验结果表明,我们的框架在定性和定量上优于其他先进的基于学习的UIE方法。代码和数据集在此 https URL 上公开可用。
https://arxiv.org/abs/2302.08269
Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the \emph{Optimal Transport (OT)} theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as \emph{regularization by enhancing (RE)}. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.
非光学的视网膜彩色影像显微镜(CFP)因其不需要瞳孔扩张的优势而广泛可用,但质量往往会由于操作员、系统瑕疵或患者相关的原因而不佳。为了确保准确的医学诊断和治疗,必须要求最佳的视网膜图像质量。在此方面,我们利用 “最优传输(OT)理论”提出了一个没有配对图像-图像转换方案,以将低质量的视网膜 CFP 映射到高质量的对应物。此外,为了提高临床图像增强流程的灵活性、鲁棒性和适用性,我们推广了一种先进的模型驱动图像重构方法,通过去噪来 Regularization,并利用我们的OT指导的图像-图像转换网络学习的潜在前缀,将其 plug 在它学习到的 priors 中,我们称之为 “增强性的Regularization(RE)”。我们针对三个公开的视网膜图像数据集进行了OTRE集成框架的验证,通过评估增强后的图像质量和它们在各种后续任务中的表现,包括糖尿病视网膜病变分级、 vessel 分割和糖尿病病变分割的任务。实验结果表明,我们提出的框架相对于一些先进的未监督竞争对手和先进的监督方法具有优势。
https://arxiv.org/abs/2302.03003
To assist underwater object detection for better performance, image enhancement technology is often used as a pre-processing step. However, most of the existing enhancement methods tend to pursue the visual quality of an image, instead of providing effective help for detection tasks. In fact, image enhancement algorithms should be optimized with the goal of utility improvement. In this paper, to adapt to the underwater detection tasks, we proposed a lightweight dynamic enhancement algorithm using a contribution dictionary to guide low-level corrections. Dynamic solutions are designed to capture differences in detection preferences. In addition, it can also balance the inconsistency between the contribution of correction operations and their time complexity. Experimental results in real underwater object detection tasks show the superiority of our proposed method in both generalization and real-time performance.
为了帮助水下物体检测取得更好的性能,图像增强技术常常用作预处理步骤。然而,大多数现有增强方法倾向于追求图像的视觉质量,而不是为检测任务提供有效的帮助。实际上,图像增强算法应该以功能改善为目标进行优化。在本文中,为了适应水下检测任务,我们提出了一种轻量级的动态增强算法,使用贡献字典来指导低水平修正。动态解决方案旨在捕捉检测偏好之间的差异。此外,它还可以平衡修正操作的贡献与它们的时间和复杂性。在真实的水下物体检测任务中的实验结果表明,我们提出的方法在泛化性能和实时性能方面都表现出了优越性。
https://arxiv.org/abs/2302.02553
Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities of self-driving vehicles. However, it intrinsically relies upon the photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose the first method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. This strategy originates from the observation that nighttime images not only suffer from underexposed regions but also from overexposed regions. By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally. We benchmark the method on two established datasets: nuScenes and RobotCar and demonstrate state-of-the-art performance on both of them. Detailed ablations also reveal the mechanism of our proposal. Last but not least, to mitigate the problem of sparse ground truth of existing datasets, we provide a new photo-realistically enhanced nighttime dataset based upon CARLA. It brings meaningful new challenges to the community. Codes, data, and models are available at this https URL.
自监督的深度估计最近吸引了很多关注,因为它可以推动自动驾驶车辆3D感知能力的发展。然而,它本质上依赖于光线测量一致性假设,而在晚上很难成立。虽然已经提出了多种监督的夜晚图像增强方法,但它们在挑战性的驾驶场景中的泛化性能并不令人满意。为此,我们提出了一种方法,可以 jointly学习夜晚图像增强和深度估计任务,而无需使用任何任务 ground truth。我们的方法通过新提出的不确定像素掩膜策略,紧密结合了两个自监督任务。该策略源自观察,夜晚图像不仅受到欠曝区域的影响,也受到过曝区域的影响。通过将一个桥形曲线适用于光照地图分布,两个区域被抑制,两个任务自然地连接起来。我们在两个已存在的数据集上进行了基准测试:nuScenes和RobotCar,并展示了它们在这两个数据集上最先进的性能。详细的减法也揭示了我们提议的机制。最后但同样重要的是,为了缓解现有数据集缺乏 ground truth 的问题,我们提供了基于CARLA的新逼真增强的夜晚数据集。它为社区带来了有意义的新挑战。代码、数据和模型可在 this https URL 上获取。
https://arxiv.org/abs/2302.01334
The degradation in the underwater images is due to wavelength-dependent light attenuation, scattering, and to the diversity of the water types in which they are captured. Deep neural networks take a step in this field, providing autonomous models able to achieve the enhancement of underwater images. We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization paradigm from VQGAN for this task. The proposed UWCVGAN combines an encoding network, which compresses the image into its latent representation, with a decoding network, able to reconstruct the enhancement of the image from the only latent representation. In contrast with VQGAN, UWCVGAN achieves feature quantization by exploiting the clusterization ability of capsule layer, making the model completely trainable and easier to manage. The model obtains enhanced underwater images with high quality and fine details. Moreover, the trained encoder is independent of the decoder giving the possibility to be embedded onto the collector as compressing algorithm to reduce the memory space required for the images, of factor $3\times$. \myUWCVGAN{ }is validated with quantitative and qualitative analysis on benchmark datasets, and we present metrics results compared with the state of the art.
水下图像的退化是由于波长dependent光衰减、散射以及捕捉水类型的多样化造成的。深度学习网络在这个领域中迈出了一步,提供了能够提高水下图像质量的自主模型。我们提出了基于VQGAN离散特征量化范式的水下Capsules Vectors GAN UWCVGAN,以解决这个问题。 proposed UWCVGAN将编码网络和解码网络结合起来,将图像压缩成其隐状态表示,并利用解码网络从隐状态表示中恢复图像的增强。与VQGAN不同,UWCVGAN通过利用胶囊层集群化的能力实现特征量化,使模型完全可训练且更容易管理。模型能够获得高质量的、精细的增强水下图像。此外,训练的编码器与解码器是独立的,可以将压缩算法嵌入收集器中,以减小图像所需的内存空间,减少 factor 3 的大小。 myUWCVGAN{ }通过基准数据集的定量和定性分析进行了验证,并与当前技术水平进行了比较。
https://arxiv.org/abs/2302.01144
Film, a classic image style, is culturally significant to the whole photographic industry since it marks the birth of photography. However, film photography is time-consuming and expensive, necessitating a more efficient method for collecting film-style photographs. Numerous datasets that have emerged in the field of image enhancement so far are not film-specific. In order to facilitate film-based image stylization research, we construct FilmSet, a large-scale and high-quality film style dataset. Our dataset includes three different film types and more than 5000 in-the-wild high resolution images. Inspired by the features of FilmSet images, we propose a novel framework called FilmNet based on Laplacian Pyramid for stylizing images across frequency bands and achieving film style outcomes. Experiments reveal that the performance of our model is superior than state-of-the-art techniques. Our dataset and code will be made publicly available.
https://arxiv.org/abs/2301.08880