Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancement benchmark (UVEB) to promote the development of underwater this http URL contains 1,308 pairs of video sequences and more than 453,000 high-resolution with 38\% Ultra-High-Definition (UHD) 4K frame pairs. UVEB comes from multiple countries, containing various scenes and video degradation types to adapt to diverse and complex underwater environments. We also propose the first supervised underwater video enhancement method, UVE-Net. UVE-Net converts the current frame information into convolutional kernels and passes them to adjacent frames for efficient inter-frame information exchange. By fully utilizing the redundant degraded information of underwater videos, UVE-Net completes video enhancement better. Experiments show the effective network design and good performance of UVE-Net.
基于学习的 underwater图像增强(UIE)方法取得了很大的进展。然而,缺乏大规模和高质量的成对训练样本已成为阻碍UIE发展的主要瓶颈。水下视频中的帧间信息可以加速或优化UIE过程。因此,我们构建了第一个大规模高分辨率水下视频增强基准(UVEB)以促进水下图像增强的发展。 UVEB来自多个国家,包含各种场景和视频衰退类型,以适应多样和复杂的水下环境。我们还提出了第一个监督式水下视频增强方法,UVE-Net。UVE-Net将当前帧信息转换为卷积内核并传递给相邻帧进行有效的帧间信息交流。通过充分利用水下视频的冗余衰退信息,UVE-Net完成视频增强效果更好。实验结果表明,UVE-Net的有效的网络设计和良好的性能。
https://arxiv.org/abs/2404.14542
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
本文回顾了NTIRE 2024低光图像增强挑战,重点介绍了所提出的解决方案和结果。该挑战的目标是发现一种有效的网络设计或解决方案,能够在处理各种情况下产生更亮、更清晰、更美观的结果,包括超高清分辨率(4K及更高)、非均匀照明、反光、极度黑暗和夜间场景。值得注意的是,共有428名参与者注册参加挑战,最终有22支队伍提出了有效的参赛作品。本文详细评估了提高低光图像效果的现有技术进步,反映了该领域在进步和创造力方面的重要性。
https://arxiv.org/abs/2404.14248
Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at this https URL.
极低光文本图像在自然场景中很常见,使得场景文本检测和识别变得具有挑战性。一种解决方案是在文本提取之前使用低光图像增强方法增强这些图像。然而,之前的方法通常没有特别关注低级别特征的重要性,这些特征对于下游场景文本任务具有关键作用。此外,缺乏极低光文本数据集也进一步阻碍了进一步的研究。为了克服这些限制,我们提出了一个新颖的编码器-解码器框架,配备边缘感知注意模块,以在增强过程中关注场景文本区域。我们的方法利用新的文本检测和边缘重构损失来强调低级别场景文本特征,从而实现成功的文本提取。此外,我们还提出了一个基于已知场景文本数据集如ICDAR15( see In the Dark,SID)的监督深度曲线估计(Supervised-DCE)模型,用于基于公开可用的场景文本数据合成极低光图像。我们还对极低光See In the Dark(SID)和普通Low-Light(LOL)数据集中的文本进行了标注,以使场景文本任务通过场景文本评估极低光图像增强。大量的实验结果表明,我们的模型在广泛使用的LOL、SID和合成IC15数据集上的图像质量和场景文本指标都优于最先进的方法。代码和数据集将在这个https:// URL上发布。
https://arxiv.org/abs/2404.14135
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality.
在现实场景中,捕获的图像经常受到模糊、噪声和其他图像退化形式的影响,由于传感器限制,人们通常只能获得低动态范围图像。为了获得高质量的图像,研究人员对照片进行了各种图像修复和增强操作,包括去噪、去模糊和高动态范围成像。然而,仅进行一种图像增强操作仍然无法产生令人满意的图像。在本文中,为了应对上述挑战,我们提出了复合优化网络(CRNet)来解决这个问题,利用多个曝光图像。通过完全整合信息丰富的多个曝光输入,CRNet可以执行统一图像修复和增强。为了提高图像细节质量,CRNet通过池化层明确区分和加强高和低频信息,使用专门设计的Multi-Branch Blocks对这两个频率进行有效的融合。为了增加接收范围并完全整合输入特征,CRNet采用High-Frequency Enhancement Module,包括大内核卷积和反向瓶颈ConvFFN。我们的模型在Bracketing Image Restoration and Enhancement Challenge的第一 track获得了第三名的成绩,在测试指标和视觉质量方面均超过了之前的最佳模型。
https://arxiv.org/abs/2404.14132
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach is still affected by the secondary complexity of the Transformer and cannot maximize the performance. Recently, the state-space model (SSM) based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity. This paper explores the potential of this SSM-based model for UIE from both efficiency and effectiveness perspectives. However, the performance of directly applying Mamba is poor because local fine-grained features, which are crucial for image enhancement, cannot be fully utilized. Specifically, we customize the MambaUIE architecture for efficient UIE. Specifically, we introduce visual state space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. Also, for these two kinds of information, we propose a Dynamic Interaction Block (DIB) and Spatial feed-forward Network (SGFN) for intra-block feature aggregation. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy. Experiments on UIEB datasets show that our method reduces GFLOPs by 67.4% (2.715G) relative to the SOTA method. To the best of our knowledge, this is the first UIE model constructed based on SSM that breaks the limitation of FLOPs on accuracy in UIE. The official repository of MambaUIE at this https URL.
水下图像增强(UIE)技术旨在解决因光吸收和散射而导致的水下图像降解问题。近年来,基于卷积神经网络(CNN)和基于Transformer的方法已经得到了广泛探索。此外,结合CNN和Transformer可以有效地结合全局和局部信息进行增强。然而,这种方法仍然受到Transformer的二级复杂性的影响,无法实现最佳性能。最近,基于状态空间模型(SSM)的Mamba架构被提出,它在保持线性复杂性的同时表现出长距离建模能力。本文探讨了这种SSM-based模型在UIE方面的潜力和效果。然而,直接应用Mamba的结果性能较差,因为本地细粒度特征,这些特征对于图像增强至关重要,无法得到充分利用。具体来说,我们为MambaUIE架构定制了高效的UIE。具体来说,我们在宏观层面捕捉全局上下文信息,同时也在微观层面挖掘局部信息。此外,为了这两种信息,我们提出了动态交互块(DIB)和空间前馈网络(SGFN)用于块级特征聚合。MambaUIE能够有效地合成全局和局部信息,并具有非常小的参数数量,具有很高的准确性。在UIEB数据集上的实验表明,与最先进的UIE方法相比,我们的方法减少了67.4%的GFLOPs(2.715G)。据我们所知,这是基于SSM构建的第一种能够突破FLOPs限制的UIE模型。MambaUIE的官方仓库地址为:https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https>
https://arxiv.org/abs/2404.13884
Underwater images taken from autonomous underwater vehicles (AUV's) often suffer from low light, high turbidity, poor contrast, motion-blur and excessive light scattering and hence require image enhancement techniques for object recognition. Machine learning methods are being increasingly used for object recognition under such adverse conditions. These enhanced object recognition methods of images taken from AUV's has potential applications in underwater pipeline and optical fibre surveillance, ocean bed resource extraction, ocean floor mapping, underwater species exploration, etc. While the classical machine learning methods are very efficient in terms of accuracy, they require large datasets and high computational time for image classification. In the current work, we use quantum-classical hybrid machine learning methods for real-time under-water object recognition on-board an AUV for the first time. We use real-time motion-blurred and low-light images taken from an on-board camera of AUV built in-house and apply existing hybrid machine learning methods for object recognition. Our hybrid methods consist of quantum encoding and flattening of classical images using quantum circuits and sending them to classical neural networks for image classification. The results of hybrid methods carried out using Pennylane based quantum simulators both on GPU and using pre-trained models on an on-board NVIDIA GPU chipset are compared with results from corresponding classical machine learning methods. We observe that the hybrid quantum machine learning methods show an efficiency greater than 65\% and reduction in run-time by one-thirds and require 50\% smaller dataset sizes for training the models compared to classical machine learning methods. We hope that our work opens up further possibilities in quantum enhanced real-time computer vision in autonomous vehicles.
自主水下车辆(AUV)拍摄的水下图像通常存在低光、高浊度、对比度差、运动模糊和过度光线散射等问题,因此需要图像增强技术来进行目标识别。机器学习方法在AUV拍摄的水下图像目标识别方面得到了越来越多的应用。利用AUV拍摄的水下图像的增强目标识别方法具有潜在的应用,如水下管道和光纤监测、海底资源开采、海底地形图、水下物种探索等。尽管经典的机器学习方法在准确性方面非常有效,但它们需要大量数据和高的计算时间进行图像分类。在当前工作中,我们使用量子经典混合机器学习方法进行AUV上实时水下物体识别,这是第一次在AUV上实现。我们使用AUV自带相机上的实时运动模糊和低光图像,并应用现有的混合机器学习方法进行目标识别。我们的混合方法包括量子编码和经典图像平铺,利用量子电路对经典图像进行量子编码,并将其发送到经典神经网络进行图像分类。使用Pennylane基于量子模拟器的混合方法在GPU和预训练的模型上进行的结果与相应的经典机器学习方法的结果进行了比较。我们观察到,混合量子机器学习方法显示出比经典机器学习方法超过65%的效率,并且在运行时间上减少了三分之一,同时训练模型的数据集需要量比经典方法小50%。我们希望我们的工作为自主车辆的量子增强实时计算机视觉开辟更广阔的可能性。
https://arxiv.org/abs/2404.13130
Accurate localization is fundamental for autonomous underwater vehicles (AUVs) to carry out precise tasks, such as manipulation and construction. Vision-based solutions using fiducial marker are promising, but extremely challenging underwater because of harsh lighting condition underwater. This paper introduces a gradient-based active camera exposure control method to tackle sharp lighting variations during image acquisition, which can establish better foundation for subsequent image enhancement procedures. Considering a typical scenario for underwater operations where visual tags are used, we proposed several experiments comparing our method with other state-of-the-art exposure control method including Active Exposure Control (AEC) and Gradient-based Exposure Control (GEC). Results show a significant improvement in the accuracy of robot localization. This method is an important component that can be used in visual-based state estimation pipeline to improve the overall localization accuracy.
准确的局部定位对于自主水下车辆(AUVs)执行精确任务(如操作和建设)至关重要。使用标记引导的视觉解决方案前景广阔,但在水下由于恶劣的照明条件而变得极其困难。本文介绍了一种基于梯度的主动相机曝光控制方法,以解决图像采集期间图像锐利的照明变化,为后续图像增强过程奠定更好的基础。考虑到水下操作中通常使用视觉标签的情况,我们提出了几种实验,将我们的方法与其他最先进的曝光控制方法(包括主动曝光控制(AEC)和基于梯度的曝光控制(GEC))进行比较。结果表明,机器人的局部定位精度得到了显著提高。这种方法是用于视觉 based 状态估计管道以提高整体局部定位精度的关键组成部分。
https://arxiv.org/abs/2404.12055
This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at this https URL.
本研究针对基于鱼眼镜头摄像头的城市交通监测检测系统所面临的不断演变挑战,提出了一个框架来提高这些系统的有效性和准确性。在城市的基础设施和交通管理背景下,先进的交通监测系统对于管理城市化复杂性和增加车辆密度至关重要。传统监测方法,依赖静态摄像头,其视野狭窄,在动态城市环境中无效,需要安装多个摄像头,这会增加成本。鱼眼镜头,最近引入,在单帧中提供广泛和全向覆盖,使得它们成为变革性的解决方案。然而,像扭曲和模糊这样的问题出现,使得这些图像上的准确物体检测效果受限。为了应对这些挑战,本研究提出了一个结合基于Transformer的图像增强框架和集成学习技术的新方法,以解决这些问题并提高交通监测准确性,对智能交通管理系统的发展做出了重要贡献。我们提出的方法论框架在2024 AI City Challenge Track 4中获得了第五名,实验验证数据中的F1分数为0.5965。实验结果证明了所提出系统的有效性、效率和稳健性。我们的代码公开在https://这个URL上。
https://arxiv.org/abs/2404.10078
Image restoration, which aims to recover high-quality images from their corrupted counterparts, often faces the challenge of being an ill-posed problem that allows multiple solutions for a single input. However, most deep learning based works simply employ l1 loss to train their network in a deterministic way, resulting in over-smoothed predictions with inferior perceptual quality. In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective, emphasizing the learning of distributions rather than individual pixel values. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. To make this spatial entropy differentiable, we employ kernel density estimation (KDE) to approximate the probabilities for specific intensity values of each pixel with their neighbor areas. Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss. In the experiments, we evaluate the proposed method for low light enhancement on two datasets and the NTIRE challenge 2024. All these results illustrate the effectiveness of our statistic-based entropy loss. Code is available at this https URL.
图像修复的目标是从损坏的图像中恢复高质量的图像,通常面临着一个具有单个输入多项式解的问题。然而,大多数基于深度学习的作品仅仅采用L1损失来以确定性的方式训练网络,导致预测过拟合,感知质量差。在本文中,我们提出了一种新方法,将重点从确定性的像素逐像素比较转变为统计视角,强调学习分布而不是单个像素值。核心思想是引入空间熵到损失函数中,以测量预测和目标之间的分布差异。为了使空间熵不同寻常,我们采用核密度估计(KDE)来近似每个像素具有与其邻居区域的具体强度值的概率。具体来说,我们将熵与扩散模型相结合,旨在实现与基于L1噪声匹配的损失相比的卓越准确性和感知质量的提高。在实验中,我们对所提出的方法在两个数据集上的低光增强进行了评估,以及NTIRE挑战2024。所有这些结果都说明了基于统计熵的熵损失的有效性。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.09735
Improving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observed by the moving robot, characterized by motion blur and low-resolution, and high-quality query images provided by the user. Such domain gaps could significantly reduce the task success rate but have not been the focus of previous work. To address this, we propose a novel method called Few-shot Cross-quality Instance-aware Adaptation (CrossIA), which employs contrastive learning with an instance classifier to align features between massive low- and few high-quality images. This approach effectively reduces the domain gap by bringing the latent representations of cross-quality images closer on an instance basis. Additionally, the system integrates an object image collection with a pre-trained deblurring model to enhance the observed image quality. Our method fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We evaluated our method's effectiveness through an InstanceImageNav task with 20 different types of instances, where the robot identifies the same instance in a real-world environment as a high-quality query image. Our experiments showed that our method improves the task success rate by up to three times compared to the baseline, a conventional approach based on SuperGlue. These findings highlight the potential of leveraging contrastive learning and image enhancement techniques to bridge the domain gap and improve object localization in robotic applications. The project website is this https URL.
提高实例特定图像目标导航(InstanceImageNav)对于机器人系统协助用户在现实环境中找到所需物品至关重要。挑战在于移动机器人观测到的低质量图像与用户提供的优质图像之间的领域差距。这种领域差距可能会显著降低任务成功率,但以前的工作并未将此作为重点。为解决这个问题,我们提出了名为Few-shot Cross-quality Instance-aware Adaptation(CrossIA)的新方法,该方法采用对比学习与实例分类器来将大型低质量图像和少量高质量图像的特征对齐。这种方法通过在实例基础上将跨质量图像的潜在表示拉近,有效减少了领域差距。此外,系统还集成了一个预训练去雾模型来提高观测到的图像质量。我们使用CrossIA对SimSiam模型进行微调。我们对20种不同类型的实例进行了InstanceImageNav任务评估,机器人在一个真实环境中识别出相同实例作为高质量查询图像。我们的实验结果表明,与基于SuperGlue的传统方法相比,我们的方法将任务成功率提高了300%以上。这些发现强调了利用对比学习和图像增强技术跨越领域差距并在机器人应用中改善物体定位的可能性。项目网站是https://www.xxx。
https://arxiv.org/abs/2404.09645
Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater object detection, referred to as BG-YOLO. In the proposed method, network is organized by constructing an enhancement branch and a detection branch in a parallel way. The enhancement branch consists of a cascade of an image enhancement subnet and an object detection subnet. And the detection branch only consists of a detection subnet. A feature guided module connects the shallow convolution layer of the two branches. When training the enhancement branch, the object detection subnet in the enhancement branch guides the image enhancement subnet to be optimized towards the direction that is most conducive to the detection task. The shallow feature map of the trained enhancement branch will be output to the feature guided module, constraining the optimization of detection branch through consistency loss and prompting detection branch to learn more detailed information of the objects. And hence the detection performance will be refined. During the detection tasks, only detection branch will be reserved so that no additional cost of computation will be introduced. Extensive experiments demonstrate that the proposed method shows significant improvement in performance of the detector in severely degraded underwater scenes while maintaining a remarkable detection speed.
降解的水下图像降低水下物体检测的准确性。然而,水下图像增强的主要方法主要关注提高视觉方面的指标,这可能不会对水下物体检测任务产生好处,甚至可能导致性能严重下降。为解决这个问题,我们提出了一个双向引导的水下物体检测方法,称为BG-YOLO。在所提出的方法中,网络通过构建一个增强分支和一个检测分支来组织。增强分支包括图像增强子网和一个物体检测子网。而检测分支仅包括一个检测子网。一个特征引导模块连接了两个分支的浅卷积层。在训练增强分支时,增强分支中的物体检测子网指导图像增强子网朝着最有益于检测任务的方向进行优化。训练后的增强分支的浅特征图将输出到特征引导模块,通过一致损失约束检测分支,并通过提示检测分支学习更详细的信息来提高检测性能。因此,检测性能将得到改进。在检测任务期间,只保留检测分支以避免引入额外的计算成本。大量实验证明,在严重降解的水下场景中,所提出的方法显示出明显的检测器性能提升,同时保持出色的检测速度。
https://arxiv.org/abs/2404.08979
Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
在低光环境中定位文本具有挑战性,因为会出现视觉退化。尽管简单的解决方案涉及两个步骤:首先进行低光图像增强(LLE),然后是检测器,但LLE主要针对人类视觉而不是机器,并可能累积错误。在这项工作中,我们提出了一个高效且有效的单阶段方法来在黑暗中定位文本,绕过了需要LLE的步骤。我们在文本检测器的训练阶段引入了一个约束学习模块作为附加机制。这个模块的设计旨在指导文本检测器在特征图缩放过程中保留文本空间特征,从而在低光视觉退化下最小化文本中的空间信息损失。具体来说,我们在这个模块中引入了空间重构和空间语义约束,以确保文本检测器获得了关键的位置和上下文范围知识。我们的方法通过动态蛇特征金字塔网络增强了原始文本检测器的能力,并采用了一种新颖的矩形累积技术,实现了对平滑文本特征的准确边界描绘。此外,我们还提出了一个涵盖任意形状文本的全面低光数据集,包括各种场景和语言。值得注意的是,我们的方法在低光数据集上取得了最先进的成果,同时在标准正常光线数据集上的表现与标准 normal 光线数据集相当。代码和数据集将公开发布。
https://arxiv.org/abs/2404.08965
In this paper we have present an improved Cycle GAN based model for under water image enhancement. We have utilized the cycle consistent learning technique of the state-of-the-art Cycle GAN model with modification in the loss function in terms of depth-oriented attention which enhance the contrast of the overall image, keeping global content, color, local texture, and style information intact. We trained the Cycle GAN model with the modified loss functions on the benchmarked Enhancing Underwater Visual Perception (EUPV) dataset a large dataset including paired and unpaired sets of underwater images (poor and good quality) taken with seven distinct cameras in a range of visibility situation during research on ocean exploration and human-robot cooperation. In addition, we perform qualitative and quantitative evaluation which supports the given technique applied and provided a better contrast enhancement model of underwater imagery. More significantly, the upgraded images provide better results from conventional models and further for under water navigation, pose estimation, saliency prediction, object detection and tracking. The results validate the appropriateness of the model for autonomous underwater vehicles (AUV) in visual navigation.
在本文中,我们提出了一个改进的基于Cycle GAN的深海图像增强模型。我们利用了最先进的Cycle GAN模型的循环一致学习技术,并对损失函数进行了修改,以实现深度定向关注,从而增强整个图像的对比度,同时保留全局内容、颜色、局部纹理和样式信息。我们使用修改后的损失函数在经过充分验证的深海视觉感知(EUPV)数据集上训练了Cycle GAN模型,该数据集包括由七种不同相机在各种能见度条件下拍摄的 paired和未 paired水下图像(劣质和优质)。此外,我们还进行了定性和定量的评估,证明了所提出的技术具有实际应用价值,并提供了更好的水下图像增强模型。值得注意的是,升级后的图像在传统模型的基础上表现更好,对于水下导航、姿态估计、熵检测、物体检测和跟踪等应用具有更高的性能。这些结果证实了该模型在自主水下车辆(AUV)视觉导航方面的适用性。
https://arxiv.org/abs/2404.07649
This study systematically investigates the impact of image enhancement techniques on Convolutional Neural Network (CNN)-based Brain Tumor Segmentation, focusing on Histogram Equalization (HE), Contrast Limited Adaptive Histogram Equalization (CLAHE), and their hybrid variations. Employing the U-Net architecture on a dataset of 3064 Brain MRI images, the research delves into preprocessing steps, including resizing and enhancement, to optimize segmentation accuracy. A detailed analysis of the CNN-based U-Net architecture, training, and validation processes is provided. The comparative analysis, utilizing metrics such as Accuracy, Loss, MSE, IoU, and DSC, reveals that the hybrid approach CLAHE-HE consistently outperforms others. Results highlight its superior accuracy (0.9982, 0.9939, 0.9936 for training, testing, and validation, respectively) and robust segmentation overlap, with Jaccard values of 0.9862, 0.9847, and 0.9864, and Dice values of 0.993, 0.9923, and 0.9932 for the same phases, emphasizing its potential in neuro-oncological applications. The study concludes with a call for refinement in segmentation methodologies to further enhance diagnostic precision and treatment planning in neuro-oncology.
本研究系统地研究了图像增强技术对基于卷积神经网络(CNN)的脑肿瘤分割的影响,重点关注归一化等价(HE)、对比有限适应性归一化(CLAHE)及其混合变体。在包含3064个脑部MRI图像的数据集上应用U-Net架构,研究深入探讨了预处理步骤,包括缩放和增强,以优化分割准确性。提供了基于CNN的U-Net架构、训练和验证过程的详细分析。比较分析使用了诸如准确度、损失、均方误差(MSE)、IoU和DSC等指标,显示了混合方法CLAHE-HE始终优于其他方法。结果突出了其卓越的准确性(分别为训练、测试和验证的0.9982、0.9939和0.9936),以及稳健的分割重叠,以及 Jaccard 值为0.9862、0.9847 和0.9864,以及IoU值为0.993、0.9923 和0.9932 的相同阶段。研究强调了其在神经肿瘤学应用中的潜在价值。研究结论呼吁在分割方法上进行优化,以进一步提高神经肿瘤学诊断的准确性和治疗规划。
https://arxiv.org/abs/2404.05341
Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement to address these challenges. In particular, we reframe LLIE as learning an image-to-code mapping from low-light images to discrete codebook, which has been learned from high-quality images. To enhance this process, a Semantic Embedding Module (SEM) is introduced to integrate semantic information with low-level features, and a Codebook Shift (CS) mechanism, designed to adapt the pre-learned codebook to better suit the distinct characteristics of our low-light dataset. Additionally, we present an Interactive Feature Transformation (IFT) module to refine texture and color information during image reconstruction, allowing for interactive enhancement based on user preferences. Extensive experiments on both real-world and synthetic benchmarks demonstrate that the incorporation of prior knowledge and controllable information transfer significantly enhances LLIE performance in terms of quality and fidelity. The proposed CodeEnhance exhibits superior robustness to various degradations, including uneven illumination, noise, and color distortion.
低光图像增强(LLIE)旨在改善低光图像。然而,现有的方法面临两个挑战:(1)从不同亮度退化中恢复修复的不确定性;(2)由于噪声抑制和光增强而丢失纹理和颜色信息。在本文中,我们提出了一种新增强方法,称为CodeEnhance,通过利用量化先验信息和图像修复来解决这些挑战。特别地,我们将LLIE重新表述为从低光图像中学习图像到编码映射,这是从高质量图像中学习的高质量图像。为了增强这个过程,我们引入了一个语义嵌入模块(SEM),以将语义信息与低级特征集成,并设计了一个Codebook Shift(CS)机制,旨在将预先学习的编码器适应该低光数据集的显著特征。此外,我们还介绍了交互式特征转换(IFT)模块,用于在图像重建过程中修复纹理和颜色信息,并允许根据用户偏好进行交互式增强。在现实世界和合成基准上进行的大量实验证明,引入先验知识和可控制信息传递 significantly增强了LLIE在质量和保真度方面的性能。所提出的CodeEnhance在各种退化中表现出卓越的鲁棒性,包括不均匀光照、噪声和颜色失真。
https://arxiv.org/abs/2404.05253
This paper introduces the physics-inspired synthesized underwater image dataset (PHISWID), a dataset tailored for enhancing underwater image processing through physics-inspired image synthesis. Deep learning approaches to underwater image enhancement typically demand extensive datasets, yet acquiring paired clean and degraded underwater ones poses significant challenges. While several underwater image datasets have been proposed using physics-based synthesis, a publicly accessible collection has been lacking. Additionally, most underwater image synthesis approaches do not intend to reproduce atmospheric scenes, resulting in incomplete enhancement. PHISWID addresses this gap by offering a set of paired ground-truth (atmospheric) and synthetically degraded underwater images, showcasing not only color degradation but also the often-neglected effects of marine snow, a composite of organic matter and sand particles that considerably impairs underwater image clarity. The dataset applies these degradations to atmospheric RGB-D images, enhancing the dataset's realism and applicability. PHISWID is particularly valuable for training deep neural networks in a supervised learning setting and for objectively assessing image quality in benchmark analyses. Our results reveal that even a basic U-Net architecture, when trained with PHISWID, substantially outperforms existing methods in underwater image enhancement. We intend to release PHISWID publicly, contributing a significant resource to the advancement of underwater imaging technology.
本文介绍了一个基于物理图像生成的水下图像数据集(PHISWID),该数据集专门用于通过物理图像合成来增强水下图像处理。水下图像增强通常需要大量的数据,然而获取成对的水下干净和污损图像会面临重大挑战。虽然基于物理图像生成的水下图像数据集已经提出了几个,但目前还没有公开可用的集。此外,大多数水下图像合成方法并没有意图复制大气场景,导致增强效果不完整。PHISWID通过提供一组成对的水下地面(大气)和合成降解的水下图像,不仅展示了色彩降解,还突出了经常被忽视的海洋雪(由有机物质和沙子颗粒组成的复合物,对水下图像清晰度有很大影响)的影响。该数据集将这些降解应用到大气RGB-D图像中,提高了数据集的逼真度和适用性。PHISWID对于在监督学习环境中训练深度神经网络以及在基准分析中客观评估图像质量具有特别价值。我们的结果表明,即使是最基本的U-Net架构,当使用PHISWID进行训练时,也会显著优于现有方法在水下图像增强方面。我们打算将PHISWID公开发布,为水下成像技术的发展贡献重大资源。
https://arxiv.org/abs/2404.03998
Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex theory in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the lowlight enhancement problem in an unsupervised manner, we propose an image-adaptive masked reverse degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods.
许多基于Retinex理论的低光图像增强(LLIE)方法忽略了数字图像中影响该理论有效性的重要因素,如噪声、量化误差、非线性以及动态范围溢出。在本文中,我们通过数字图像中Retinex理论的分析,提出了一个新的表达式称为数字图像Retinex理论(DI-Retinex)。我们的新表达式包括增强模型的偏移项,允许非线性映射函数对每个像素进行逐点亮度对比调整。此外,为了以无监督的方式解决低光增强问题,我们提出了在Gamma空间中的图像自适应掩码反向退化损失。我们还设计了一个用于调节附加偏移项的方差抑制损失。大量的实验结果表明,与所有现有无监督方法相比,我们的方法在视觉质量、模型大小和速度方面都表现出色。此外,我们的算法还可以帮助下游面部检测在低光环境中获得更好的性能,因为与其它方法相比,在低光增强后,其性能提升最为显著。
https://arxiv.org/abs/2404.03327
We present a new additive image factorization technique that treats images to be composed of multiple latent specular components which can be simply estimated recursively by modulating the sparsity during decomposition. Our model-driven {\em RSFNet} estimates these factors by unrolling the optimization into network layers requiring only a few scalars to be learned. The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user in a controllable fashion. Based on RSFNet, we detail a zero-reference Low Light Enhancement (LLE) application trained without paired or unpaired supervision. Our system improves the state-of-the-art performance on standard benchmarks and achieves better generalization on multiple other datasets. We also integrate our factors with other task specific fusion networks for applications like deraining, deblurring and dehazing with negligible overhead thereby highlighting the multi-domain and multi-task generalizability of our proposed RSFNet. The code and data is released for reproducibility on the project homepage.
我们提出了一种新的附加图像因素分解技术,该技术处理由多个潜在极化子组件组成的图像。这些因素可以通过在分解过程中对稀疏度的调节来简单地递归估计。我们的模型驱动的{\em RSFNet}通过将优化展开到仅需要学习几个标量来处理的网络层中来估计这些因素。由此产生的因素可以通过网络或通过用户在可控制的方式进行融合,用于不同的图像增强任务。基于RSFNet,我们详细介绍了一个无需配对或非配对监督的零参考低光增强(LLE)应用。我们的系统在标准基准上提高了最先进的性能,并在多个其他数据集上取得了更好的泛化能力。我们还将我们的因素与其他任务特定的融合网络集成,用于诸如去雾、去噪和去雾等应用。通过显著的 overhead,提高了我们提出的RSFNet的多领域和多任务通用性。代码和数据发布在项目主页上以进行可重复性。
https://arxiv.org/abs/2404.01998
In this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction.
在本文中,我们提出了一种新颖的对于无监督反光照像增强任务的 Contrastive Language-Image 前馈(CLIP)指导。我们的工作基于最先进的 CLIP-LIT 方法,该方法通过在 CLIP 嵌入空间中约束提示(负样本/正样本)与相应图像(反光照像/良好光照图像)之间的文本-图像相似性来学习提示对。学习到的提示 then 指导图像增强网络。基于 CLIP-LIT 框架,我们提出了两种新颖的 CLIP 指导方法。首先,我们证明了直接在语义空间中调整提示而不是在文本嵌入空间中进行调整,不会损失质量。这加速了训练,并有可能使使用没有文本编码器的额外编码器成为可能。其次,我们提出了一种不需要提示调整的新方法。我们基于训练数据的反光照像和良好光照图像的 CLIP 嵌入计算残差向量作为简单差异来表示照明条件下的图像。该向量在训练期间指导增强网络,将反光照像推向良好光照图像的空间。这种方法进一步显著减少了训练时间,稳定了训练,并产生了高质量的增强图像,同时避免了伪影,无论是监督还是无监督训练模式下。此外,我们还证明了残差向量可以解释,揭示了训练数据中的偏见,从而实现了可能的偏见纠正。
https://arxiv.org/abs/2404.01889
Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is confined to stylistic variants inferred from the training data. To surmount the above challenges, we propose an unsupervised learning-based approach for text-based image tone adjustment method, CLIPtone, that extends an existing image enhancement method to accommodate natural language descriptions. Specifically, we design a hyper-network to adaptively modulate the pretrained parameters of the backbone model based on text description. To assess whether the adjusted image aligns with the text description without ground truth image, we utilize CLIP, which is trained on a vast set of language-image pairs and thus encompasses knowledge of human perception. The major advantages of our approach are three fold: (i) minimal data collection expenses, (ii) support for a range of adjustments, and (iii) the ability to handle novel text descriptions unseen in training. Our approach's efficacy is demonstrated through comprehensive experiments, including a user study.
近年来,图像色调调整(或增强)方法主要采用监督学习来进行人机中心感知评估。然而,这些方法受到监督学习内生挑战的限制。首先,专家编辑或修复图像的需求导致数据获取费用增加。其次,它们对目标风格的覆盖仅限于从训练数据中推断的文体变异。为了克服上述挑战,我们提出了一个基于无监督学习的文本图像色调调整方法,CLIPtone,该方法将现有的图像增强方法扩展到适应自然语言描述。具体来说,我们设计了一个超网络,根据文本描述自适应地调整骨干模型的预训练参数。为了评估调整后的图像是否与文本描述一致,我们使用了CLIP,它在一个广泛的语图像对训练集上进行训练,因此包括人类感知知识。我们方法的主要优势是三倍:(一)最小数据收集费用,(二)支持各种调整,(三)能够处理在训练中未见过的文本描述。通过全面的实验,包括用户研究,我们证明了这种方法的有效性。
https://arxiv.org/abs/2404.01123