Novel view synthesis (NVS) aims to generate images at arbitrary viewpoints using multi-view images, and recent insights from neural radiance fields (NeRF) have contributed to remarkable improvements. Recently, studies on generalizable NeRF (G-NeRF) have addressed the challenge of per-scene optimization in NeRFs. The construction of radiance fields on-the-fly in G-NeRF simplifies the NVS process, making it well-suited for real-world applications. Meanwhile, G-NeRF still struggles in representing fine details for a specific scene due to the absence of per-scene optimization, even with texture-rich multi-view source inputs. As a remedy, we propose a Geometry-driven Multi-reference Texture transfer network (GMT) available as a plug-and-play module designed for G-NeRF. Specifically, we propose ray-imposed deformable convolution (RayDCN), which aligns input and reference features reflecting scene geometry. Additionally, the proposed texture preserving transformer (TP-Former) aggregates multi-view source features while preserving texture information. Consequently, our module enables direct interaction between adjacent pixels during the image enhancement process, which is deficient in G-NeRF models with an independent rendering process per pixel. This addresses constraints that hinder the ability to capture high-frequency details. Experiments show that our plug-and-play module consistently improves G-NeRF models on various benchmark datasets.
用于生成任意视角下的图像的多视角图像,并受到来自神经辐射场(NeRF)的最近见解的显著改进。最近,在可扩展NeRF(G-NeRF)的研究中,解决了NeRFs中场景优化的问题。在G-NeRF中,在运行时构建辐射场简化了NVS过程,使其非常适合现实世界的应用。然而,由于缺乏场景优化,G-NeRF在表示特定场景的细小细节方面仍然遇到困难,即使文本丰富多视角输入存在。为了克服这一问题,我们提出了一个面向G-NeRF的Geometry驱动多参考纹理传输网络(GMT)作为可插拔模块。具体来说,我们提出了ray-imposed deformable convolution(RayDCN),它将输入和参考特征对齐,并保留场景几何信息。此外,所提出的纹理保留Transformer(TP-Former)在聚合多视角源特征的同时保留纹理信息。因此,我们的模块在图像增强过程中可以直接交互相邻像素,而G-NeRF模型在独立渲染过程中每个像素的渲染限制。这解决了阻碍捕捉高频细节的能力的问题。实验证明,我们的可插拔模块在各种基准数据集上 consistently改善了G-NeRF模型。
https://arxiv.org/abs/2410.00672
Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our approach involves recovering an underwater image to a reference enhanced image through supervised training and decomposing it into color and content codes via self-reconstruction and cross-reconstruction. The color code is explicitly constrained to follow a Gaussian distribution, allowing for efficient sampling and interpolation during inference. ColorCode offers three key features: 1) color enhancement, producing an enhanced image with a fixed color; 2) color adaptation, enabling controllable adjustments of long-wavelength color components using guidance images; and 3) color interpolation, allowing for the smooth generation of multiple colors through continuous sampling of the color code. Quantitative and visual evaluations on popular and challenging benchmark datasets demonstrate the superiority of ColorCode over existing methods in providing diverse, controllable, and color-realistic enhancement results. The source code is available at this https URL.
由于吸收和散射效应,水下图像通常质量低下。现有的水下图像增强算法产生单色固定颜色图像,限制了用户灵活性和应用。为了应对这一限制,我们提出了一个名为ColorCode的方法,通过有监督训练提高水下图像,并具有可控制的颜色输出范围。我们的方法涉及通过自重建和跨重建将水下图像恢复到参考增强图像,并通过正则化约束颜色码遵循高斯分布。ColorCode具有三个关键特点:1)颜色增强,生成具有固定颜色的增强图像;2)颜色适应,使用指导图像控制长波长颜色分量的可控调整;3)颜色插值,通过连续采样颜色码生成多个颜色。对于流行和具有挑战性的基准数据集的定量和视觉评估显示,ColorCode相对于现有方法在提供多样、可控制和色彩真实的增强结果方面具有优越性。源代码可在此处下载:https://www.colorcode-method.com/
https://arxiv.org/abs/2409.19685
Majority of deep learning methods utilize vanilla convolution for enhancing underwater images. While vanilla convolution excels in capturing local features and learning the spatial hierarchical structure of images, it tends to smooth input images, which can somewhat limit feature expression and modeling. A prominent characteristic of underwater degraded images is blur, and the goal of enhancement is to make the textures and details (high-frequency features) in the images more visible. Therefore, we believe that leveraging high-frequency features can improve enhancement performance. To address this, we introduce Pixel Difference Convolution (PDC), which focuses on gradient information with significant changes in the image, thereby improving the modeling of enhanced images. We propose an underwater image enhancement network, PDCFNet, based on PDC and cross-level feature fusion. Specifically, we design a detail enhancement module based on PDC that employs parallel PDCs to capture high-frequency features, leading to better detail and texture enhancement. The designed cross-level feature fusion module performs operations such as concatenation and multiplication on features from different levels, ensuring sufficient interaction and enhancement between diverse features. Our proposed PDCFNet achieves a PSNR of 27.37 and an SSIM of 92.02 on the UIEB dataset, attaining the best performance to date. Our code is available at this https URL.
大多数深度学习方法都利用卷积神经网络(CNN)来增强水下图像。虽然传统的卷积神经网络在捕捉局部特征和学习图像的空间层次结构方面表现出色,但它往往会使输入图像平滑,这可能会限制特征表达和建模。水下退化图像的一个突出特点就是模糊,增强图像的目标就是使图像中的纹理和细节(高频特征)更加明显。因此,我们相信利用高频特征可以提高增强性能。为了应对这个问题,我们引入了像素差分卷积(PDC),它关注于图像中显著变化的部分,从而提高增强图像的建模。我们提出了基于PDC和跨层特征融合的 underwater 图像增强网络 PDCFNet。具体来说,我们基于PDC设计了一个细节增强模块,采用并行PDC来捕捉高频特征,从而实现更好的细节和纹理增强。设计的跨层特征融合模块对不同级别的特征进行操作,确保不同特征之间的足够交互和增强。我们的 PDCFNet 在 UIEB 数据集上实现了 PSNR 27.37 和 SSIM 92.02,实现了目前最佳的性能。我们的代码可以从该链接获取:https://www.kaggle.com/xiaoling22/uieb-dataset
https://arxiv.org/abs/2409.19269
Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework based on diffusion priors and lookup tables (DPLUT) to achieve efficient low-light image recovery. The proposed approach comprises two critical components: a light adjustment lookup table (LLUT) and a noise suppression lookup table (NLUT). LLUT is optimized with a set of unsupervised losses. It aims at predicting pixel-wise curve parameters for the dynamic range adjustment of a specific image. NLUT is designed to remove the amplified noise after the light brightens. As diffusion models are sensitive to noise, diffusion priors are introduced to achieve high-performance noise suppression. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in terms of visual quality and efficiency.
低光图像增强(LIE)旨在精确有效地恢复在欠光照环境下受损的图像。最近,先进的LIE技术采用深度神经网络,这需要大量的低正常光图像对、网络参数和计算资源。因此,它们的实用性受到限制。在本文中,我们提出了一种基于扩散优先权和查找表(DPLUT)的新型无监督LIE框架,以实现高效的低光图像恢复。所提出的方法包括两个关键组件:一个光调整查找表(LLUT)和一个噪声抑制查找表(NLUT)。LLUT通过一系列无监督损失进行优化。它旨在预测特定图像的动态范围调整过程中的像素级曲线参数。NLUT在光变亮后设计用于消除放大噪声。由于扩散模型对噪声敏感,我们引入了扩散优先权来实现高性能的噪声抑制。大量实验证明,我们的方法在视觉质量和效率方面超过了最先进的方法。
https://arxiv.org/abs/2409.18899
Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: this https URL
水下视觉对于自主水下车辆(AUV)至关重要,在资源受限的AUV上实时提高降解的海底图像是一个关键挑战,由于诸如光吸收和散射等因素,或者因为足够模型计算复杂性来解决这些因素。传统的图像增强技术对不同水下条件缺乏适应性,而基于学习的特别是使用卷积神经网络(CNN)和生成对抗网络(GAN)的方法提供更健壮的解决方案,但存在诸如增强不足、训练不稳定或模态崩溃等限制。去噪扩散概率模型(DDPM)已成为图像到图像任务的顶尖方法,但使用最近的水下UW-DDPM解决方案来实现所需的水下图像增强(UIE)需要大量的计算复杂性。为了应对这些挑战,本文引入了 UW-DiffPhys,一种基于物理和扩散的图像到图像UIE方法。 UW-DiffPhys 结合了基于光计算的物理基UIE网络组件和一个去噪U-Net,用现有的 UW-DDPM 框架中的计算密集型分布转换 U-Net 替换,从而降低复杂性,同时保持性能。此外,还采用了去噪扩散隐式模型(DDIM)来加速推理过程。 实验结果表明,与 UW-DDPM 相比, UW-DiffPhys 在计算复杂性和推理时间方面取得了显着减少,同时在关键指标如 PSNR、SSIM、UCIQE 和总体水下图像质量指标 UIQM 上具有竞争力的性能。实现代码可在此仓库中找到:https://this URL。
https://arxiv.org/abs/2409.18476
Cone Beam Computed Tomography (CBCT) finds diverse applications in medicine. Ensuring high image quality in CBCT scans is essential for accurate diagnosis and treatment delivery. Yet, the susceptibility of CBCT images to noise and artifacts undermines both their usefulness and reliability. Existing methods typically address CBCT artifacts through image-to-image translation approaches. These methods, however, are limited by the artifact types present in the training data, which may not cover the complete spectrum of CBCT degradations stemming from variations in imaging protocols. Gathering additional data to encompass all possible scenarios can often pose a challenge. To address this, we present SinoSynth, a physics-based degradation model that simulates various CBCT-specific artifacts to generate a diverse set of synthetic CBCT images from high-quality CT images without requiring pre-aligned data. Through extensive experiments, we demonstrate that several different generative networks trained on our synthesized data achieve remarkable results on heterogeneous multi-institutional datasets, outperforming even the same networks trained on actual data. We further show that our degradation model conveniently provides an avenue to enforce anatomical constraints in conditional generative models, yielding high-quality and structure-preserving synthetic CT images.
Cone Beam Computed Tomography (CBCT) 在医学领域有多种应用。确保 CBCT 扫描具有高图像质量对准确诊断和治疗至关重要。然而,CBCT 图像对噪声和伪影的易受性削弱了其有用性和可靠性。现有的方法通常通过图像到图像的映射方法来解决 CBCT 伪影。然而,这些方法仅限于训练数据中存在的伪影类型,这可能无法涵盖由于成像方案变化引起的 CBCT 伪影的完整范围。收集更多数据以涵盖所有可能场景往往具有挑战性。为解决这个问题,我们提出了 SinoSynth,一种基于物理的降解模型,通过模拟各种 CBCT 特有的伪影来生成一系列高质量的合成 CBCT 图像,而无需事先进行对齐的数据。通过大量实验,我们证明了几个不同生成网络在我们的合成数据上训练,在异质多机构数据集上取得了显著的进步,甚至超过了训练在实际数据上的相同网络。我们进一步表明,我们的降解模型为在条件生成模型中实施解剖约束提供了一个途径,产生了高质量和结构保留的合成 CT 图像。
https://arxiv.org/abs/2409.18355
Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics.
尽管在最近低光图像增强技术方面取得了显著的进步,但配对数据的缺乏已经成为了进一步进步的一个显著障碍。本文提出了一种以平均教师为基础的半监督低光增强(Semi-LLIE)框架,将未配对数据整合到模型训练中。平均教师技术是一种著名的半监督学习方法,已成功应用于解决高级和低级视觉任务。然而,两个主要问题阻碍了 naive 平均教师方法在低光图像增强方面的最佳性能。首先,像素级的一致性损失不足以将教师模型的真实光照分布从学生模型中传输到学生模型,导致增强图像出现色彩偏差。其次,由于它们倾向于忽视在局部区域内建模结构信息,前沿图像增强方法无法有效地与平均教师框架合作来恢复暗区详细信息。为了减轻上述问题,我们首先引入了一种语义感知对比损失,以忠实传输光照分布,有助于增强具有自然颜色的图像。然后,我们设计了一个基于Mamba的低光图像增强骨干网络,通过多尺度特征学习方案有效地增强Mamba的局部区域像素关系表示能力,促进生成具有丰富纹理细度的图像。此外,我们提出了基于大型视觉语言识别模型(RAM)的大规模视觉语言感知损失,以帮助生成具有更丰富文本详细信息的增强图像。实验结果表明,我们的 Semi-LLIE 在数量和质量指标上超越了现有方法。
https://arxiv.org/abs/2409.16604
Adversarial attacks in computer vision exploit the vulnerabilities of machine learning models by introducing subtle perturbations to input data, often leading to incorrect predictions or classifications. These attacks have evolved in sophistication with the advent of deep learning, presenting significant challenges in critical applications, which can be harmful for society. However, there is also a rich line of research from a transformative perspective that leverages adversarial techniques for social good. Specifically, we examine the rise of proactive schemes-methods that encrypt input data using additional signals termed templates, to enhance the performance of deep learning models. By embedding these imperceptible templates into digital media, proactive schemes are applied across various applications, from simple image enhancements to complicated deep learning frameworks to aid performance, as compared to the passive schemes, which don't change the input data distribution for their framework. The survey delves into the methodologies behind these proactive schemes, the encryption and learning processes, and their application to modern computer vision and natural language processing applications. Additionally, it discusses the challenges, potential vulnerabilities, and future directions for proactive schemes, ultimately highlighting their potential to foster the responsible and secure advancement of deep learning technologies.
计算机视觉中的对抗性攻击通过引入微小的输入数据扰动来利用机器学习模型的漏洞,通常导致错误的预测或分类。随着深度学习的出现,这些攻击的复杂性已经得到了极大的提升,为关键应用带来了显著的挑战,这些挑战对社会有害。然而,从变革性的角度来看,也有一系列研究利用对抗性技术来促进社会公益。具体来说,我们检查了主动计划方法的出现,这些方法使用额外的信号称为模板来加密输入数据,以提高深度学习模型的性能。通过将这些无形模板嵌入数字媒体中,主动计划方法在各种应用中得到应用,从简单的图像增强到复杂的深度学习框架,以帮助提高性能,与被动计划相比,这些主动计划不改变输入数据的分布。调查深入研究了这些主动计划的背后方法、加密和学习过程以及它们应用于现代计算机视觉和自然语言处理应用程序的事实。此外,还讨论了主动计划面临的挑战、潜在漏洞以及未来的发展方向,最终突出了它们推动深度学习技术负责任和安全发展的潜力。
https://arxiv.org/abs/2409.16491
Low-light images are commonly encountered in real-world scenarios, and numerous low-light image enhancement (LLIE) methods have been proposed to improve the visibility of these images. The primary goal of LLIE is to generate clearer images that are more visually pleasing to humans. However, the impact of LLIE methods in high-level vision tasks, such as image classification and object detection, which rely on high-quality image datasets, is not well {explored}. To explore the impact, we comprehensively evaluate LLIE methods on these high-level vision tasks by utilizing an empirical investigation comprising image classification and object detection experiments. The evaluation reveals a dichotomy: {\textit{While Low-Light Image Enhancement (LLIE) methods enhance human visual interpretation, their effect on computer vision tasks is inconsistent and can sometimes be harmful. }} Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis, indicating a need for LLIE methods tailored to support high-level vision tasks effectively. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision.
低光图像在现实场景中非常常见,为了改善这些图像的可见度,已经提出了许多低光图像增强(LLIE)方法。LLIE的主要目标是为人类产生更清晰、更美观的图像。然而,LLIE方法在高级视觉任务(如图像分类和目标检测)中的影响尚不明确。为了探索这个问题,我们通过使用图像分类和目标检测实验,全面评估LLIE方法在这些高级视觉任务上的效果。评估结果揭示了二分法:虽然LLIE方法可以增强人类视觉解释,但对于计算机视觉任务,它们的效果不统一,甚至有时可能是有害的。我们的研究结果表明,图像增强对于人类视觉理解和机器分析之间存在脱节,表明为高层次视觉任务定制专门的LLIE方法至关重要。这个见解对于开发既符合人类又符合机器视觉需求的LLIE技术具有关键意义。
https://arxiv.org/abs/2409.14461
We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train a robust and flexible diffusion-based image enhancement network that is highly effective as a stand-alone method, unlike previous diffusion model-based approaches which act only as a refiner on top of pre-trained models. Through extensive experiments, we show that FD3 establishes \add{superior quality} not only on synthetic degradations but also on in vivo studies with low-quality fundus photos taken from patients with cataracts or small pupils. To promote further research in this area, we open-source all our code and data used for this research at this https URL
我们提出了FD3,一种基于直接扩散桥的 fundus 图像增强方法,可以应对广泛的复杂退化,包括雾、模糊、噪声和阴影。我们首先通过与认证的眼科医生进行人类反馈环,提出了一种合成前向模型,以最大程度地提高低质量活检图像的质量。通过所提出的 forward 模型,我们训练了一个健壮且灵活的扩散-based 图像增强网络,作为单独的方法,有效性和之前基于扩散模型的方法相比,可以在很大程度上提升。通过大量实验,我们证明了 FD3 在不仅限于合成退化,而且在来自患有一定程度白内障或小瞳孔病人体内的活检图像上,都取得了卓越的增强效果。为了进一步研究这个领域,我们在此处公开了所有用于这项研究的代码和数据,地址为 https://url。
https://arxiv.org/abs/2409.12377
Ultrasound imaging, despite its widespread use in medicine, often suffers from various sources of noise and artifacts that impact the signal-to-noise ratio and overall image quality. Enhancing ultrasound images requires a delicate balance between contrast, resolution, and speckle preservation. This paper introduces a novel approach that integrates adaptive beamforming with denoising diffusion-based variance imaging to address this challenge. By applying Eigenspace-Based Minimum Variance (EBMV) beamforming and employing a denoising diffusion model fine-tuned on ultrasound data, our method computes the variance across multiple diffusion-denoised samples to produce high-quality despeckled images. This approach leverages both the inherent multiplicative noise of ultrasound and the stochastic nature of diffusion models. Experimental results on a publicly available dataset demonstrate the effectiveness of our method in achieving superior image reconstructions from single plane-wave acquisitions. The code is available at: this https URL.
超声成像,尽管在医学领域得到了广泛应用,但经常受到各种噪声和伪像的影响,从而影响信号与噪声比和整体图像质量。提高超声图像质量需要在一个对contrast(对比度)、resolution(分辨率)和speckle preservation(伪像保持)之间的微调取得平衡。本文介绍了一种将自适应波形形成与基于扩散的伪像成像相结合的新方法,以解决这一挑战。通过应用基于Eigenspace的最小方差(EBMV)波形形成和利用在超声数据上微调的denoising diffusion模型,我们的方法计算了多个扩散去噪样本之间的方差,从而产生了高质量的去噪图像。这种方法利用了超声的固有乘法噪声和扩散模型的随机性质。公开可用数据集上的实验结果表明,我们的方法能够从单光子测距中实现卓越的图像重构。代码可在此处下载:https://this URL。
https://arxiv.org/abs/2409.11380
Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schrödinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schrödinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at \url{this https URL}.
视网膜 fundus 摄影在诊断和监测视网膜疾病方面具有重要意义。然而,全身不完善和操作者/患者相关因素可能阻碍高质量视网膜图像的获取。先前对视网膜图像增强的努力主要依赖于 GAN,它们的训练稳定性与输出多样性之间存在权衡。相比之下,Schrödinger Bridge(SB)通过利用最优传输(OT)理论来建模两个任意分布之间的随机微分方程(SDE),提供了一个更稳定的解决方案。这使得SB能够有效地将低质量的视网膜图像转换为高质量的同类。 在这项工作中,我们利用SB框架提出了一个图像到图像的视网膜图像增强管道。此外,以前的方法通常无法捕捉到细结构细节,如血管。为了解决这个问题,我们通过引入动态蛇卷积来增强我们的管道,该卷积的曲折的接收场可以更好地保留管状结构。我们将这种增强后的视网膜 fundus 图像命名为“上下文感知无配对神经 Schrödinger Bridge”(CUNSB-RFIE)。据我们所知,这是第一个使用SB方法进行视网膜图像增强的尝试。在一大型数据集上的实验结果表明,与几种最先进的监督和无监督方法相比,所提出方法在图像质量和下游任务上的性能具有优势。 代码可在此处访问:\url{this <https:// URL>}。
https://arxiv.org/abs/2409.10966
With the rapid development of marine engineering projects such as marine resource extraction and oceanic surveys, underwater visual imaging and analysis has become a critical technology. Unfortunately, due to the inevitable non-linear attenuation of light in underwater environments, underwater images and videos often suffer from low contrast, blurriness, and color degradation, which significantly complicate the subsequent research. Existing underwater image enhancement methods often treat the haze and color cast as a unified degradation process and disregard their independence and interdependence, which limits the performance improvement. Here, we propose a Vision Transformer (ViT)-based network (referred to as WaterFormer) to improve the underwater image quality. WaterFormer contains three major components: a dehazing block (DehazeFormer Block) to capture the self-correlated haze features and extract deep-level features, a Color Restoration Block (CRB) to capture self-correlated color cast features, and a Channel Fusion Block (CFB) to capture fusion features within the network. To ensure authenticity, a soft reconstruction layer based on the underwater imaging physics model is included. To improve the quality of the enhanced images, we introduce the Chromatic Consistency Loss and Sobel Color Loss to train the network. Comprehensive experimental results demonstrate that WaterFormer outperforms other state-of-the-art methods in enhancing underwater images.
随着海洋工程项目(如海洋资源开采和海洋调查)的快速发展和水下视觉成像分析,水下视觉成像和分析已成为关键技术。然而,由于水下环境中光的不可避免非线性衰减,水下图像和视频通常会出现低对比度、模糊和色彩退化,这使得后续研究变得复杂。现有的水下图像增强方法通常将雾和色彩衰减视为一个统一的过程,而忽略了它们之间的独立性和相互依赖关系,从而限制了性能的提高。本文提出了一种基于Vision Transformer(ViT)的网络(称为WaterFormer)来提高水下图像质量。WaterFormer包含三个主要组件:去雾模块(DehazeFormer Block)来捕捉自相关雾特征并提取深层次特征,色彩还原模块(CRB)来捕捉自相关色彩衰减特征,以及通道融合模块(CFB)来捕捉网络内的融合特征。为了确保真实感,水下成像物理模型为基础的软重建层被纳入。为了提高增强图像的质量和对比度,我们引入了色差平衡损失和Sobel色彩损失来训练网络。综合实验结果表明,WaterFormer在其他最先进的方法中表现优异。
https://arxiv.org/abs/2409.09779
Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distribution alignment problem, by finding a one-to-one mapping between a low-quality image and its high-quality counterpart. This paper proposes a context-informed optimal transport (OT) learning framework for tackling unpaired fundus image enhancement. In contrast to standard generative image enhancement methods, which struggle with handling contextual information (e.g., over-tampered local structures and unwanted artifacts), the proposed context-aware OT learning paradigm better preserves local structures and minimizes unwanted artifacts. Leveraging deep contextual features, we derive the proposed context-aware OT using the earth mover's distance and show that the proposed context-OT has a solid theoretical guarantee. Experimental results on a large-scale dataset demonstrate the superiority of the proposed method over several state-of-the-art supervised and unsupervised methods in terms of signal-to-noise ratio, structural similarity index, as well as two downstream tasks. The code is available at \url{this https URL}.
眼底摄影是一种非侵入性的方法,用于诊断和监测各种眼底疾病,但容易受到系统不完善或操作者/患者相关因素导致的固有质量问题。然而,高质量的眼底图像对于进行准确诊断和自动化分析至关重要。眼底图像增强通常被视为分布对齐问题,通过找到低质量图像和高质量图像之间一对一的映射。本文提出了一种基于上下文的有条件最优传输(OT)学习框架来解决无配对眼底图像增强问题。与标准生成图像增强方法不同,该方法在处理上下文信息(例如过度处理局部结构和不需要的伪影)方面存在困难。通过利用深层次的上下文特征,我们使用地球平移距离来表示所提出的上下文有条件OT,并证明了所提出的上下文OT具有 solid theoretical guarantee。在大型数据集上的实验结果表明,与几个最先进的监督和无监督方法相比,所提出的方法在信噪比、结构相似性指数以及两个下游任务方面具有优越性。代码可在此处访问:\url{this <https:// this URL> }。
https://arxiv.org/abs/2409.07862
The emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI.
文本到图像生成的模型的出现使得人们对将图像增强作为后处理来提高生成的图像的视觉质量有了更深入的认识。然而,探索扩散模型增强生成图像仍然具有挑战性,并且需要仔细丰富丰富的细节,同时保留原始图像中的关键内容视觉外观。在本文中,我们提出了一个名为FreeEnhance的新框架,用于使用标准图像扩散模型进行内容一致的图像增强。从技术上讲,FreeEnhance是一个两阶段的过程:首先向输入图像添加随机噪声,然后利用预训练的图像扩散模型(即潜在扩散模型)去噪并增强图像细节。在噪声阶段,FreeEnhance被设计为向具有更高频率的区域添加较轻的噪声,以保留原始图像中的高频模式(例如边缘,角落)。在去噪阶段,我们提出了三个目标属性作为约束,以规范预测噪声,通过高精度和高视觉质量增强图像。对HPDv2数据集的广泛实验证明,我们的FreeEnhance在定量指标和人类偏好方面超过了最先进的图像增强模型。更值得注意的是,FreeEnhance与商业图像增强解决方案Magnific AI相比,具有更高的的人类偏好。
https://arxiv.org/abs/2409.07451
Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. Many deep learning-based methods have been developed to address this issue and have shown promising results in recent years. However, single-stage methods, which attempt to unify the complex mapping across both domains, leading to limited denoising performance. In contrast, two-stage approaches typically decompose a raw image with color filter arrays (CFA) into a four-channel RGGB format before feeding it into a neural network. However, this strategy overlooks the critical role of demosaicing within the Image Signal Processing (ISP) pipeline, leading to color distortions under varying lighting conditions, especially in low-light scenarios. To address these issues, we design a novel Mamba scanning mechanism, called RAWMamba, to effectively handle raw images with different CFAs. Furthermore, we present a Retinex Decomposition Module (RDM) grounded in Retinex prior, which decouples illumination from reflectance to facilitate more effective denoising and automatic non-linear exposure correction. By bridging demosaicing and denoising, better raw image enhancement is achieved. Experimental evaluations conducted on public datasets SID and MCR demonstrate that our proposed RAWMamba achieves state-of-the-art performance on cross-domain mapping.
低光图像增强,尤其是在从原始领域到sRGB领域的跨领域任务中,仍然是一个重要的挑战。为了应对这个问题,已经开发了许多基于深度学习的技术,并在近年来取得了良好的结果。然而,单阶段方法试图在两个领域统一复杂映射,导致在去噪性能方面有限。相反,两阶段方法通常在将原始图像(带彩色滤波器阵列CFA)输入神经网络之前将其分解为四通道的RGGB格式。然而,这种策略忽视了在图像信号处理(ISP)流程中降维关键作用,导致在不同光照条件下出现颜色失真,尤其是在低光场景中。为了应对这些问题,我们设计了一个名为RAWMamba的新Mamba扫描机制,以有效地处理具有不同CFAs的原始图像。此外,我们还基于Retinex提出了一个Retinex分解模块(RDM),该模块将照明与反射从增益中分离,以促进更有效的去噪和自动非线性曝光校正。通过桥梁降维和去噪,可以实现更好的原始图像增强。在公共数据集SID和MCR上进行实验评估,我们的RAWMamba在跨领域映射上实现了最先进的性能。
https://arxiv.org/abs/2409.07040
The primary purpose of this paper is to present the concept of dichotomy in image illumination modeling based on the power function. In particular, we review several mathematical properties of the power function to identify the limitations and propose a new mathematical model capable of abstracting illumination dichotomy. The simplicity of the equation opens new avenues for classical and modern image analysis and processing. The article provides practical and illustrative image examples to explain how the new model manages dichotomy in image perception. The article shows dichotomy image space as a viable way to extract rich information from images despite poor contrast linked to tone, lightness, and color perception. Moreover, a comparison with state-of-the-art methods in image enhancement provides evidence of the method's value.
本文的主要目的是基于功率函数阐述图像光照建模中的二分性概念。特别是,我们回顾了几种功率函数的数学性质,以确定其局限性并提出了一个能够抽象化光照二分性的新数学模型。方程的简单性为经典和现代图像分析和处理打开了新的途径。本文提供了实际和演示性的图像例子,说明如何管理图像感知中的二分性。本文表明,二分性图像空间是一种可以从图像中提取丰富信息的有效方法,即使与 tone、lightness 和 color perception 相关的对比度很差。此外,与图像增强领域的最先进方法进行比较,提供了该方法价值的证据。
https://arxiv.org/abs/2409.06764
Learning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks. In this paper, we propose a general adversarial attack protocol. We make a first attempt to conduct adversarial attacks on five well-designed UWIE models on three common underwater image benchmark datasets. Considering the scattering and absorption of light in the underwater environment, there exists a strong correlation between color correction and underwater image enhancement. On the basis of that, we also design two effective UWIE-oriented adversarial attack methods Pixel Attack and Color Shift Attack targeting different color spaces. The results show that five models exhibit varying degrees of vulnerability to adversarial attacks and well-designed small perturbations on degraded images are capable of preventing UWIE models from generating enhanced results. Further, we conduct adversarial training on these models and successfully mitigated the effectiveness of adversarial attacks. In summary, we reveal the adversarial vulnerability of UWIE models and propose a new evaluation dimension of UWIE models.
基于学习的 underwater图像增强(UWIE)方法已经得到了广泛探索。然而,基于学习的模型通常对对抗性样本非常敏感,因此作为UWIE模型的学习-based模型可能受到攻击。据我们所知,还没有关于UWIE模型的全面对抗性研究,这表明UWIE模型可能受到攻击威胁。在本文中,我们提出了一个通用的对抗攻击协议。我们在三个常见的水下图像基准数据集上对五个经过良好设计的UWIE模型进行了首次对抗攻击尝试。考虑到水下环境中的光散射和吸收,颜色校正与水下图像增强之间存在很强的关联。因此,我们还针对不同的色彩空间设计了两有效的UWIE定向对抗攻击方法:Pixel Attack和Color Shift Attack。结果表明,五个模型对攻击的鲁棒性程度不同,而在受损图像上设计的小扰动有效阻止了UWIE模型产生增强效果。此外,我们对这些模型进行了对抗训练,并成功减轻了攻击的有效性。总之,我们揭示了UWIE模型的对抗性漏洞,并提出了一个新的UWIE模型评估维度。
https://arxiv.org/abs/2409.06420
Low-light image enhancement remains a critical challenge in computer vision, as does the lightweight design for edge devices with the computational burden for deep learning models. In this article, we introduce an extended version of Channel-Prior and Gamma-Estimation Network (CPGA-Net), termed CPGA-Net+, which incorporates an attention mechanism driven by a reformulated Atmospheric Scattering Model and effectively addresses both global and local image processing through Plug-in Attention with gamma correction. These innovations enable CPGA-Net+ to achieve superior performance on image enhancement tasks, surpassing lightweight state-of-the-art methods with high efficiency. Our results demonstrate the model's effectiveness and show the potential applications in resource-constrained environments.
低光图像增强仍然计算机视觉领域的一个关键挑战,同样,轻量级边缘设备的计算负担对于深度学习模型也是一项重要的挑战。在本文中,我们引入了一个名为CPGA-Net+的扩展版本,它通过一个由重新定义的大气散射模型驱动的注意力机制,有效地解决了全局和局部图像处理问题,并通过使用伽马纠正的插值注意力来提高图像增强任务的性能。这些创新使CPGA-Net+在图像增强任务中实现卓越的性能,超过了在资源受限的环境中使用的高级轻量级方法。我们的结果证明了该模型的有效性,并展示了其在资源受限环境中的潜在应用。
https://arxiv.org/abs/2409.05274
Scene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration performance for all views due to the ignorance of potential feature correspondence among different views. To alleviate this issue, we make the first attempt to investigate multi-view low-light image enhancement. First, we construct a new dataset called Multi-View Low-light Triplets (MVLT), including 1,860 pairs of triple images with large illumination ranges and wide noise distribution. Each triplet is equipped with three different viewpoints towards the same scene. Second, we propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet). Specifically, in order to benefit from similar texture correspondence across different views, we design the recurrent feature enhancement, alignment and fusion (ReEAF) module, in which intra-view feature enhancement (Intra-view EN) followed by inter-view feature alignment and fusion (Inter-view AF) is performed to model the intra-view and inter-view feature propagation sequentially via multi-view collaboration. In addition, two different modules from enhancement to alignment (E2A) and from alignment to enhancement (A2E) are developed to enable the interactions between Intra-view EN and Inter-view AF, which explicitly utilize attentive feature weighting and sampling for enhancement and alignment, respectively. Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods. All of our dataset, code, and model will be available at this https URL.
从多个视角进行场景观察会带来更全面的视觉体验。然而,在获取多个视角时,高度相关的视角之间存在严重的隔阂,使得使用辅助视图进行场景理解变得具有挑战性。最近基于单张图像的增强方法可能无法为所有视图提供一致的理想的修复性能,因为不同视图之间潜在特征的匹配被忽视了。为了解决这个问题,我们首次尝试研究多视角低光图像增强。首先,我们构建了一个名为 Multi-View Low-light Triplets (MVLT) 的新数据集,包括1,860对大光照范围和宽噪声分布的三元组图像。每个三元组都配备有对同一场景的三个不同视角。其次,我们提出了一个基于循环协同网络(RCNet)的多视角增强框架。具体来说,为了利用不同视图之间类似纹理的对应关系,我们设计了一个循环特征增强、对齐和融合(ReEAF)模块。其中,内视角增强(Intra-view EN) followed by 跨视视角特征对齐和融合(Inter-view AF)用于通过多视图协作模型模型内视和跨视纹理信息传递。此外,还开发了两个增强到对齐(E2A)和从对齐到增强(A2E)的模块,实现内视角增强和跨视视角对齐之间的交互,分别利用注意到的特征加权和采样来进行增强和对齐。实验结果表明,我们的 RCNet 显著优于其他最先进的方法。我们的数据集、代码和模型将 available at this <https://URL>。
https://arxiv.org/abs/2409.04363