In this paper we have present an improved Cycle GAN based model for under water image enhancement. We have utilized the cycle consistent learning technique of the state-of-the-art Cycle GAN model with modification in the loss function in terms of depth-oriented attention which enhance the contrast of the overall image, keeping global content, color, local texture, and style information intact. We trained the Cycle GAN model with the modified loss functions on the benchmarked Enhancing Underwater Visual Perception (EUPV) dataset a large dataset including paired and unpaired sets of underwater images (poor and good quality) taken with seven distinct cameras in a range of visibility situation during research on ocean exploration and human-robot cooperation. In addition, we perform qualitative and quantitative evaluation which supports the given technique applied and provided a better contrast enhancement model of underwater imagery. More significantly, the upgraded images provide better results from conventional models and further for under water navigation, pose estimation, saliency prediction, object detection and tracking. The results validate the appropriateness of the model for autonomous underwater vehicles (AUV) in visual navigation.
在本文中,我们提出了一个改进的基于Cycle GAN的深海图像增强模型。我们利用了最先进的Cycle GAN模型的循环一致学习技术,并对损失函数进行了修改,以实现深度定向关注,从而增强整个图像的对比度,同时保留全局内容、颜色、局部纹理和样式信息。我们使用修改后的损失函数在经过充分验证的深海视觉感知(EUPV)数据集上训练了Cycle GAN模型,该数据集包括由七种不同相机在各种能见度条件下拍摄的 paired和未 paired水下图像(劣质和优质)。此外,我们还进行了定性和定量的评估,证明了所提出的技术具有实际应用价值,并提供了更好的水下图像增强模型。值得注意的是,升级后的图像在传统模型的基础上表现更好,对于水下导航、姿态估计、熵检测、物体检测和跟踪等应用具有更高的性能。这些结果证实了该模型在自主水下车辆(AUV)视觉导航方面的适用性。
https://arxiv.org/abs/2404.07649
This study systematically investigates the impact of image enhancement techniques on Convolutional Neural Network (CNN)-based Brain Tumor Segmentation, focusing on Histogram Equalization (HE), Contrast Limited Adaptive Histogram Equalization (CLAHE), and their hybrid variations. Employing the U-Net architecture on a dataset of 3064 Brain MRI images, the research delves into preprocessing steps, including resizing and enhancement, to optimize segmentation accuracy. A detailed analysis of the CNN-based U-Net architecture, training, and validation processes is provided. The comparative analysis, utilizing metrics such as Accuracy, Loss, MSE, IoU, and DSC, reveals that the hybrid approach CLAHE-HE consistently outperforms others. Results highlight its superior accuracy (0.9982, 0.9939, 0.9936 for training, testing, and validation, respectively) and robust segmentation overlap, with Jaccard values of 0.9862, 0.9847, and 0.9864, and Dice values of 0.993, 0.9923, and 0.9932 for the same phases, emphasizing its potential in neuro-oncological applications. The study concludes with a call for refinement in segmentation methodologies to further enhance diagnostic precision and treatment planning in neuro-oncology.
本研究系统地研究了图像增强技术对基于卷积神经网络(CNN)的脑肿瘤分割的影响,重点关注归一化等价(HE)、对比有限适应性归一化(CLAHE)及其混合变体。在包含3064个脑部MRI图像的数据集上应用U-Net架构,研究深入探讨了预处理步骤,包括缩放和增强,以优化分割准确性。提供了基于CNN的U-Net架构、训练和验证过程的详细分析。比较分析使用了诸如准确度、损失、均方误差(MSE)、IoU和DSC等指标,显示了混合方法CLAHE-HE始终优于其他方法。结果突出了其卓越的准确性(分别为训练、测试和验证的0.9982、0.9939和0.9936),以及稳健的分割重叠,以及 Jaccard 值为0.9862、0.9847 和0.9864,以及IoU值为0.993、0.9923 和0.9932 的相同阶段。研究强调了其在神经肿瘤学应用中的潜在价值。研究结论呼吁在分割方法上进行优化,以进一步提高神经肿瘤学诊断的准确性和治疗规划。
https://arxiv.org/abs/2404.05341
Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement to address these challenges. In particular, we reframe LLIE as learning an image-to-code mapping from low-light images to discrete codebook, which has been learned from high-quality images. To enhance this process, a Semantic Embedding Module (SEM) is introduced to integrate semantic information with low-level features, and a Codebook Shift (CS) mechanism, designed to adapt the pre-learned codebook to better suit the distinct characteristics of our low-light dataset. Additionally, we present an Interactive Feature Transformation (IFT) module to refine texture and color information during image reconstruction, allowing for interactive enhancement based on user preferences. Extensive experiments on both real-world and synthetic benchmarks demonstrate that the incorporation of prior knowledge and controllable information transfer significantly enhances LLIE performance in terms of quality and fidelity. The proposed CodeEnhance exhibits superior robustness to various degradations, including uneven illumination, noise, and color distortion.
低光图像增强(LLIE)旨在改善低光图像。然而,现有的方法面临两个挑战:(1)从不同亮度退化中恢复修复的不确定性;(2)由于噪声抑制和光增强而丢失纹理和颜色信息。在本文中,我们提出了一种新增强方法,称为CodeEnhance,通过利用量化先验信息和图像修复来解决这些挑战。特别地,我们将LLIE重新表述为从低光图像中学习图像到编码映射,这是从高质量图像中学习的高质量图像。为了增强这个过程,我们引入了一个语义嵌入模块(SEM),以将语义信息与低级特征集成,并设计了一个Codebook Shift(CS)机制,旨在将预先学习的编码器适应该低光数据集的显著特征。此外,我们还介绍了交互式特征转换(IFT)模块,用于在图像重建过程中修复纹理和颜色信息,并允许根据用户偏好进行交互式增强。在现实世界和合成基准上进行的大量实验证明,引入先验知识和可控制信息传递 significantly增强了LLIE在质量和保真度方面的性能。所提出的CodeEnhance在各种退化中表现出卓越的鲁棒性,包括不均匀光照、噪声和颜色失真。
https://arxiv.org/abs/2404.05253
This paper introduces the physics-inspired synthesized underwater image dataset (PHISWID), a dataset tailored for enhancing underwater image processing through physics-inspired image synthesis. Deep learning approaches to underwater image enhancement typically demand extensive datasets, yet acquiring paired clean and degraded underwater ones poses significant challenges. While several underwater image datasets have been proposed using physics-based synthesis, a publicly accessible collection has been lacking. Additionally, most underwater image synthesis approaches do not intend to reproduce atmospheric scenes, resulting in incomplete enhancement. PHISWID addresses this gap by offering a set of paired ground-truth (atmospheric) and synthetically degraded underwater images, showcasing not only color degradation but also the often-neglected effects of marine snow, a composite of organic matter and sand particles that considerably impairs underwater image clarity. The dataset applies these degradations to atmospheric RGB-D images, enhancing the dataset's realism and applicability. PHISWID is particularly valuable for training deep neural networks in a supervised learning setting and for objectively assessing image quality in benchmark analyses. Our results reveal that even a basic U-Net architecture, when trained with PHISWID, substantially outperforms existing methods in underwater image enhancement. We intend to release PHISWID publicly, contributing a significant resource to the advancement of underwater imaging technology.
本文介绍了一个基于物理图像生成的水下图像数据集(PHISWID),该数据集专门用于通过物理图像合成来增强水下图像处理。水下图像增强通常需要大量的数据,然而获取成对的水下干净和污损图像会面临重大挑战。虽然基于物理图像生成的水下图像数据集已经提出了几个,但目前还没有公开可用的集。此外,大多数水下图像合成方法并没有意图复制大气场景,导致增强效果不完整。PHISWID通过提供一组成对的水下地面(大气)和合成降解的水下图像,不仅展示了色彩降解,还突出了经常被忽视的海洋雪(由有机物质和沙子颗粒组成的复合物,对水下图像清晰度有很大影响)的影响。该数据集将这些降解应用到大气RGB-D图像中,提高了数据集的逼真度和适用性。PHISWID对于在监督学习环境中训练深度神经网络以及在基准分析中客观评估图像质量具有特别价值。我们的结果表明,即使是最基本的U-Net架构,当使用PHISWID进行训练时,也会显著优于现有方法在水下图像增强方面。我们打算将PHISWID公开发布,为水下成像技术的发展贡献重大资源。
https://arxiv.org/abs/2404.03998
Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex theory in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the lowlight enhancement problem in an unsupervised manner, we propose an image-adaptive masked reverse degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods.
许多基于Retinex理论的低光图像增强(LLIE)方法忽略了数字图像中影响该理论有效性的重要因素,如噪声、量化误差、非线性以及动态范围溢出。在本文中,我们通过数字图像中Retinex理论的分析,提出了一个新的表达式称为数字图像Retinex理论(DI-Retinex)。我们的新表达式包括增强模型的偏移项,允许非线性映射函数对每个像素进行逐点亮度对比调整。此外,为了以无监督的方式解决低光增强问题,我们提出了在Gamma空间中的图像自适应掩码反向退化损失。我们还设计了一个用于调节附加偏移项的方差抑制损失。大量的实验结果表明,与所有现有无监督方法相比,我们的方法在视觉质量、模型大小和速度方面都表现出色。此外,我们的算法还可以帮助下游面部检测在低光环境中获得更好的性能,因为与其它方法相比,在低光增强后,其性能提升最为显著。
https://arxiv.org/abs/2404.03327
We present a new additive image factorization technique that treats images to be composed of multiple latent specular components which can be simply estimated recursively by modulating the sparsity during decomposition. Our model-driven {\em RSFNet} estimates these factors by unrolling the optimization into network layers requiring only a few scalars to be learned. The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user in a controllable fashion. Based on RSFNet, we detail a zero-reference Low Light Enhancement (LLE) application trained without paired or unpaired supervision. Our system improves the state-of-the-art performance on standard benchmarks and achieves better generalization on multiple other datasets. We also integrate our factors with other task specific fusion networks for applications like deraining, deblurring and dehazing with negligible overhead thereby highlighting the multi-domain and multi-task generalizability of our proposed RSFNet. The code and data is released for reproducibility on the project homepage.
我们提出了一种新的附加图像因素分解技术,该技术处理由多个潜在极化子组件组成的图像。这些因素可以通过在分解过程中对稀疏度的调节来简单地递归估计。我们的模型驱动的{\em RSFNet}通过将优化展开到仅需要学习几个标量来处理的网络层中来估计这些因素。由此产生的因素可以通过网络或通过用户在可控制的方式进行融合,用于不同的图像增强任务。基于RSFNet,我们详细介绍了一个无需配对或非配对监督的零参考低光增强(LLE)应用。我们的系统在标准基准上提高了最先进的性能,并在多个其他数据集上取得了更好的泛化能力。我们还将我们的因素与其他任务特定的融合网络集成,用于诸如去雾、去噪和去雾等应用。通过显著的 overhead,提高了我们提出的RSFNet的多领域和多任务通用性。代码和数据发布在项目主页上以进行可重复性。
https://arxiv.org/abs/2404.01998
In this paper we propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. Learned prompts then guide an image enhancement network. Based on the CLIP-LIT framework, we propose two novel methods for CLIP guidance. First, we show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality. This accelerates training and potentially enables the use of additional encoders that do not have a text encoder. Second, we propose a novel approach that does not require any prompt tuning. Instead, based on CLIP embeddings of backlit and well-lit images from training data, we compute the residual vector in the embedding space as a simple difference between the mean embeddings of the well-lit and backlit images. This vector then guides the enhancement network during training, pushing a backlit image towards the space of well-lit images. This approach further dramatically reduces training time, stabilizes training and produces high quality enhanced images without artifacts, both in supervised and unsupervised training regimes. Additionally, we show that residual vectors can be interpreted, revealing biases in training data, and thereby enabling potential bias correction.
在本文中,我们提出了一种新颖的对于无监督反光照像增强任务的 Contrastive Language-Image 前馈(CLIP)指导。我们的工作基于最先进的 CLIP-LIT 方法,该方法通过在 CLIP 嵌入空间中约束提示(负样本/正样本)与相应图像(反光照像/良好光照图像)之间的文本-图像相似性来学习提示对。学习到的提示 then 指导图像增强网络。基于 CLIP-LIT 框架,我们提出了两种新颖的 CLIP 指导方法。首先,我们证明了直接在语义空间中调整提示而不是在文本嵌入空间中进行调整,不会损失质量。这加速了训练,并有可能使使用没有文本编码器的额外编码器成为可能。其次,我们提出了一种不需要提示调整的新方法。我们基于训练数据的反光照像和良好光照图像的 CLIP 嵌入计算残差向量作为简单差异来表示照明条件下的图像。该向量在训练期间指导增强网络,将反光照像推向良好光照图像的空间。这种方法进一步显著减少了训练时间,稳定了训练,并产生了高质量的增强图像,同时避免了伪影,无论是监督还是无监督训练模式下。此外,我们还证明了残差向量可以解释,揭示了训练数据中的偏见,从而实现了可能的偏见纠正。
https://arxiv.org/abs/2404.01889
Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is confined to stylistic variants inferred from the training data. To surmount the above challenges, we propose an unsupervised learning-based approach for text-based image tone adjustment method, CLIPtone, that extends an existing image enhancement method to accommodate natural language descriptions. Specifically, we design a hyper-network to adaptively modulate the pretrained parameters of the backbone model based on text description. To assess whether the adjusted image aligns with the text description without ground truth image, we utilize CLIP, which is trained on a vast set of language-image pairs and thus encompasses knowledge of human perception. The major advantages of our approach are three fold: (i) minimal data collection expenses, (ii) support for a range of adjustments, and (iii) the ability to handle novel text descriptions unseen in training. Our approach's efficacy is demonstrated through comprehensive experiments, including a user study.
近年来,图像色调调整(或增强)方法主要采用监督学习来进行人机中心感知评估。然而,这些方法受到监督学习内生挑战的限制。首先,专家编辑或修复图像的需求导致数据获取费用增加。其次,它们对目标风格的覆盖仅限于从训练数据中推断的文体变异。为了克服上述挑战,我们提出了一个基于无监督学习的文本图像色调调整方法,CLIPtone,该方法将现有的图像增强方法扩展到适应自然语言描述。具体来说,我们设计了一个超网络,根据文本描述自适应地调整骨干模型的预训练参数。为了评估调整后的图像是否与文本描述一致,我们使用了CLIP,它在一个广泛的语图像对训练集上进行训练,因此包括人类感知知识。我们方法的主要优势是三倍:(一)最小数据收集费用,(二)支持各种调整,(三)能够处理在训练中未见过的文本描述。通过全面的实验,包括用户研究,我们证明了这种方法的有效性。
https://arxiv.org/abs/2404.01123
Event camera has recently received much attention for low-light image enhancement (LIE) thanks to their distinct advantages, such as high dynamic range. However, current research is prohibitively restricted by the lack of large-scale, real-world, and spatial-temporally aligned event-image datasets. To this end, we propose a real-world (indoor and outdoor) dataset comprising over 30K pairs of images and events under both low and normal illumination conditions. To achieve this, we utilize a robotic arm that traces a consistent non-linear trajectory to curate the dataset with spatial alignment precision under 0.03mm. We then introduce a matching alignment strategy, rendering 90% of our dataset with errors less than 0.01s. Based on the dataset, we propose a novel event-guided LIE approach, called EvLight, towards robust performance in real-world low-light scenes. Specifically, we first design the multi-scale holistic fusion branch to extract holistic structural and textural information from both events and images. To ensure robustness against variations in the regional illumination and noise, we then introduce a Signal-to-Noise-Ratio (SNR)-guided regional feature selection to selectively fuse features of images from regions with high SNR and enhance those with low SNR by extracting regional structure information from events. Extensive experiments on our dataset and the synthetic SDSD dataset demonstrate our EvLight significantly surpasses the frame-based methods. Code and datasets are available at this https URL.
事件相机因其出色的动态范围和高动态范围而最近受到了很多关注,用于低光图像增强(LIE)。然而,当前的研究由于缺乏大规模、真实世界和空间时间同步的事件图像数据集而受到限制。为此,我们提出了一个由超过30K对图像和事件组成的真实世界(室内和室外)数据集。为了实现这一目标,我们利用一个机器人臂,在低光和正常光照条件下,对数据集进行空间对齐精度为0.03mm的轨迹跟踪。然后,我们引入了一种匹配对齐策略,将数据集中的90%数据与错误小于0.01s的图像进行匹配。基于这个数据集,我们提出了一个新的事件指导的LIE方法,称为EvLight,以在现实世界的低光场景中实现稳健的性能。具体来说,我们首先设计了一个多尺度 holistic 融合分支,从事件和图像中提取整体结构和纹理信息。为了确保对区域照明和噪声的鲁棒性,我们然后引入了信号-噪声比(SNR)指导的局部特征选择,选择具有高SNR的区域特征并增强具有低SNR的区域特征,通过从事件中提取区域结构信息进行局部结构增强。对我们数据集和合成SDSD数据集的广泛实验证明,我们的EvLight明显超越了基于帧的方法。代码和数据集可在该链接处获取:https://www. thisurl.
https://arxiv.org/abs/2404.00834
While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: this https URL
虽然 burst LR 图像在改善与单个 LR 图像的 SR 图像质量方面是有用的,但接受 burst LR 图像的早期 SR 网络是在确定性方式下训练的,这已经被知道会生成模糊的 SR 图像。此外,很难完美对齐 burst LR 图像,使得 SR 图像变得更模糊。由于这些模糊的图像在感知上退化,我们试图通过扩散模型重构尖锐的高保真度边界。通过扩散模型可以重构高保真度图像。然而,早期 SR 方法使用扩散模型并未对 burst SR 任务进行优化。具体来说,从随机样本开始的反向过程没有优化图像增强和恢复方法,包括 burst SR。在我们的方法中,另一方面,使用 burst LR 特征重构输入到扩散模型中间步骤的初始 burst SR 图像。这种反向过程从中间步骤 1) 跳过扩散步骤以重构图像的整体结构,2) 专注于微纹理的优化步骤。我们的实验结果表明,我们的方法可以提高感知质量指标的得分。代码:https:// this URL
https://arxiv.org/abs/2403.19428
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at this https URL.
高清显示屏在边缘设备(如用户摄像头、智能手机和电视)中的广泛使用,推动了图像增强需求的显著增长。现有的增强模型通常在提高性能的同时,却忽略了降低硬件推理时间和功耗,尤其是在受约束的计算和存储资源边缘设备上。为此,我们提出了 Image Color Enhancement Lookup Table (ICELUT),它采用无卷积神经网络(CNN)的边缘推理,实现了非常高效的图像增强。在训练过程中,我们利用点乘积(1x1)卷积提取颜色信息,并添加一个全连接层以整合全局信息。然后,这两个组件都被无缝转换为硬件无关的LUT。ICELUT 取得了与最先进的性能相当的成绩,并且具有非常低的功耗。我们观察到,点乘积网络结构表现出出色的可扩展性,即使以 heavily downsampled 32x32 的输入图像,其性能仍然保持不变。这使得 ICELUT,首个基于LUT 的图像增强器,能够在 GPU 上达到 0.4ms 的速度,在 CPU 上达到 7ms 的速度,至少比任何 CNN 解决方案快一个数量级。代码可在此处下载:https://www.icelut.org/。
https://arxiv.org/abs/2403.19238
In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capability. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing their performance. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.
近年来,在水下图像增强(UIE)领域取得了显著的进展。然而,将其应用于高级视觉任务,如自主水下车辆(AUV)下的水下物体检测(UOD),仍然相对未经探索。这可能归因于以下几个因素: (1)现有的方法通常将UIE作为预处理步骤,这不可避免地引入了相当大的计算开销和延迟。 (2)在训练物体检测器之前增强图像的过程未必能带来性能提升。 (3)水下复杂的环境可能会在不同的场景之间引起显著的领域转移,严重削弱了UOD的性能。为了应对这些挑战,我们引入了EnYOLO,一个专为同时进行UIE和UOD的领域自适应框架。具体来说,UIE和UOD任务头共享相同的网络骨架,并采用轻量级设计。此外,为了确保两个任务之间的平衡训练,我们提出了多阶段训练策略,旨在持续提高它们的性能。此外,我们还提出了一种新的领域自适应策略,以对来自不同水下环境的特征嵌入进行对齐。全面的实验证明,我们的框架不仅在UIE和UOD任务上实现了最先进的(SOTA)性能,而且当应用于不同水下场景时表现出卓越的适应性。我们的效率分析进一步突显了我们的框架在车载部署方面的巨大潜力。
https://arxiv.org/abs/2403.19079
Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating the model at test time, using high-uncertainty predictions is known to degrade accuracy. Since the input image is the root of the distribution shift, we incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty. We hypothesize that enhancing the input image reduces prediction's uncertainty and increase the accuracy of TTA methods. On the basis of our hypothesis, we propose a novel method: Test-time Enhancer and Classifier Adaptation~(TECA). In TECA, the classification model is combined with the image enhancement model that transforms input images into recognition-friendly ones, and these models are updated by existing TTA methods. Furthermore, we found that the prediction from the enhanced image does not always have lower uncertainty than the prediction from the original image. Thus, we propose logit switching, which compares the uncertainty measure of these predictions and outputs the lower one. In our experiments, we evaluate TECA with various TTA methods and show that TECA reduces prediction's uncertainty and increases accuracy of TTA methods despite having no hyperparameters and little parameter overhead.
深度神经网络在各种计算机视觉应用中取得了显著的成功。然而,当数据分布在训练和测试之间转移时,数据分布的改变会导致模型的准确性下降。为了解决这个问题,由于其实用性,测试时间适应(TTA)方法已经得到了很好的研究。虽然TTA方法通过在测试时更新模型来增加准确性,但使用高不确定预测已知会降低准确性。由于输入图像是分布转移的根源,我们将增强输入图像的新视角纳入TTA方法中,以减少预测的不确定性。我们假设,增强输入图像会降低预测的不确定性并提高TTA方法的准确性。根据我们的假设,我们提出了一个新的方法:测试时间增强器和分类器适应(TECA)。在TECA中,将分类模型与将输入图像转换为对齐友好图像的图像增强模型相结合,并通过现有的TTA方法更新这些模型。此外,我们发现,增强后的图像的预测不确定性并不总是低于原始图像的预测不确定性。因此,我们提出了对数切换,它比较了这些预测和输出的不确定性度量,并输出较低的那个。在我们的实验中,我们用各种TTA方法评估TECA,并发现TECA通过没有超参数且参数开销较小的情况下,减少了预测的不确定性并提高了TTA方法的准确性。
https://arxiv.org/abs/2403.17423
Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma. Traditional interpolation-based zoom-in techniques often sacrifice detail and introduce artifacts. Motivated by the potential of arbitrary-scale super-resolution to naturally address these inherent challenges, we present the Residual Dense Swin Transformer Network (RDSTN), designed to capture the non-local characteristics and long-range dependencies intrinsic to ultrasound images. It comprises a linear embedding module for feature enhancement, an encoder with shifted-window attention for modeling non-locality, and an MLP decoder for continuous detail reconstruction. This strategy streamlines balancing image quality and field-of-view, which offers superior textures over traditional methods. Experimentally, RDSTN outperforms existing approaches while requiring fewer parameters. In conclusion, RDSTN shows promising potential for ultrasound image enhancement by overcoming the limitations of conventional interpolation-based methods and achieving depth-independent imaging.
超声成像对评价器官形态和功能至关重要,但深度调整会降低图像质量和视野范围,呈现出深度相关的困境。传统的基于插值的方法通常会牺牲细节并引入伪影。鉴于任意尺度超分辨率的自然解决这些固有挑战的潜力,我们提出了残余密集辛普森变换网络(RDSTN),旨在捕捉超声图像的非局部特征和长距离依赖关系。它包括一个用于特征增强的线性嵌入模块、一个具有平移窗口注意力的编码器和一个用于连续细节重构的MLP解码器。这种策略在平衡图像质量和视野范围方面取得了优越的 texture,超过了传统方法。实验证明,RDSTN在性能上优于现有方法,同时需要的参数更少。总之,RDSTN通过克服传统插值方法的局限,为超声图像增强展示了有前景的可能性,实现了无深度依赖的图像。
https://arxiv.org/abs/2403.16384
In the image acquisition process, various forms of degradation, including noise, haze, and rain, are frequently introduced. These degradations typically arise from the inherent limitations of cameras or unfavorable ambient conditions. To recover clean images from degraded versions, numerous specialized restoration methods have been developed, each targeting a specific type of degradation. Recently, all-in-one algorithms have garnered significant attention by addressing different types of degradations within a single model without requiring prior information of the input degradation type. However, these methods purely operate in the spatial domain and do not delve into the distinct frequency variations inherent to different degradation types. To address this gap, we propose an adaptive all-in-one image restoration network based on frequency mining and modulation. Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task. Specifically, we first mine low- and high-frequency information from the input features, guided by the adaptively decoupled spectra of the degraded image. The extracted features are then modulated by a bidirectional operator to facilitate interactions between different frequency components. Finally, the modulated features are merged into the original input for a progressively guided restoration. With this approach, the model achieves adaptive reconstruction by accentuating the informative frequency subbands according to different input degradations. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on different image restoration tasks, including denoising, dehazing, deraining, motion deblurring, and low-light image enhancement. Our code is available at this https URL.
在图像采集过程中,经常引入各种形式的衰减,包括噪声、雾和雨。这些衰减通常来源于相机本身或不利环境条件的固有局限性。为了从衰减版本中恢复干净的图像,已经开发了许多专门的修复方法,每个方法都针对特定的衰减类型。最近,全息算法在解决单个模型内不同类型的衰减方面得到了显著关注,而不需要先前的输入衰减类型信息。然而,这些方法仅在空间域操作,而没有深入研究不同衰减类型固有的不同频率变化。为了填补这一空白,我们提出了一个基于频率挖掘和调制的自适应全息图像修复网络。我们的方法源于观察到不同衰减类型对不同频率子带图像内容的影响,因此为每个修复任务需要不同的处理方法。具体来说,我们首先从输入特征中挖掘低频和高频信息,并基于衰减图像的适应解耦光谱进行指导。提取的特征随后被双向操作模态进行模调,以促进不同频率分量之间的相互作用。最后,模调后的特征合并到原始输入以实现渐进引导修复。通过这种方法,模型通过根据不同输入衰减类型的信息增强有用的频率子带,实现自适应重建。大量实验证明,与现有方法相比,所提出的修复方法在各种图像修复任务上实现了最先进的性能,包括去噪、消雾、去雨、运动模糊和低光图像增强。我们的代码可在此处访问:https://www.xxx.com。
https://arxiv.org/abs/2403.14614
Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this gap, we construct the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos. Based on this dataset, we train a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance. Through extensive experiments on both synthetic and real underwater videos, we demonstrate the effectiveness of our approach. This study represents the first comprehensive exploration of UVE to our knowledge. The code is available at https://anonymous.4open.science/r/UVENet.
水下视频增强(UVE)旨在提高水下视频的可见度和帧质量,这对海洋研究和探索具有重要的影响。然而,现有的方法主要关注开发用于独立增强每个帧的图像增强算法。目前缺乏针对UVE任务的监督数据和模型。为了填补这一空白,我们构建了合成水下视频增强(SUVE)数据集,包括840个不同水下风格的视频与地面参考视频的配对。基于这个数据集,我们训练了一种新颖的水下视频增强模型——UVENet,它利用跨帧关系实现更好的增强性能。通过对合成和真实水下视频的广泛实验,我们证明了我们的方法的有效性。本研究是我们知识范围内对UVE的首次全面探索。代码可于https://anonymous.4open.science/r/UVENet获取。
https://arxiv.org/abs/2403.11506
Full DNN-based image signal processors (ISPs) have been actively studied and have achieved superior image quality compared to conventional ISPs. In contrast to this trend, we propose a lightweight ISP that consists of simple conventional ISP functions but achieves high image quality by increasing expressiveness. Specifically, instead of tuning the parameters of the ISP, we propose to control them dynamically for each environment and even locally. As a result, state-of-the-art accuracy is achieved on various datasets, including other tasks like tone mapping and image enhancement, even though ours is lighter than DNN-based ISPs. Additionally, our method can process different image sensors with a single ISP through dynamic control, whereas conventional methods require training for each sensor.
完整的基于深度神经网络(ISPs)图像信号处理器(ISPs)已积极研究,并比传统ISPs获得更好的图像质量。相比之下,我们提出了一个轻量级的ISP,它由简单的传统ISP功能组成,通过增加表现力来实现高图像质量。具体来说,我们动态地控制ISP的参数,甚至局部控制。因此,在包括其他任务(如色调映射和图像增强)的各种数据集上,尽管我们的ISP比基于深度神经网络的ISP更轻,但最高精度仍得以实现。此外,通过动态控制,我们的方法可以处理不同图像传感器,而传统方法需要对每个传感器进行训练。
https://arxiv.org/abs/2403.10091
Supervised deep learning techniques can be used to generate synthetic 7T MRIs from 3T MRI inputs. This image enhancement process leverages the advantages of ultra-high-field MRI to improve the signal-to-noise and contrast-to-noise ratios of 3T acquisitions. In this paper, we introduce multiple novel 7T synthesization algorithms based on custom-designed variants of the V-Net convolutional neural network. We demonstrate that the V-Net based model has superior performance in enhancing both single-site and multi-site MRI datasets compared to the existing benchmark model. When trained on 3T-7T MRI pairs from 8 subjects with mild Traumatic Brain Injury (TBI), our model achieves state-of-the-art 7T synthesization performance. Compared to previous works, synthetic 7T images generated from our pipeline also display superior enhancement of pathological tissue. Additionally, we implement and test a data augmentation scheme for training models that are robust to variations in the input distribution. This allows synthetic 7T models to accommodate intra-scanner and inter-scanner variability in multisite datasets. On a harmonized dataset consisting of 18 3T-7T MRI pairs from two institutions, including both healthy subjects and those with mild TBI, our model maintains its performance and can generalize to 3T MRI inputs with lower resolution. Our findings demonstrate the promise of V-Net based models for MRI enhancement and offer a preliminary probe into improving the generalizability of synthetic 7T models with data augmentation.
监督深度学习技术可用于从3T MRI输入生成合成7T MRI。这种图像增强过程利用了超高清MRI的优势来提高3T扫描的信号与噪声比和对比与噪声比。在本文中,我们基于自定义设计的V-Net卷积神经网络引入了多个新的7T合成算法。我们证明了基于V-Net的模型在增强单站点和多站点MRI数据集方面比现有基准模型具有卓越性能。当用8个受轻度创伤性脑损伤(TBI)的患者进行训练时,我们的模型在增强7T合成性能方面达到了最先进的水平。与之前的工作相比,我们通过我们的管道生成的合成7T图像还突出了病理组织增强。此外,我们还实现并测试了一个数据增强方案,用于训练对输入分布的变化具有鲁棒性的模型。这使得合成7T模型能够适应多站点数据集中的内部和间歇性变化。在一个由两个机构共18个3T-7T MRI对组成的和谐数据集中,我们的模型保持其性能,并可以降低分辨率后的3T MRI输入。我们的研究结果证明了V-Net基于模型的MRI增强前景,并为提高合成7T模型的泛化性提供了初步试探。
https://arxiv.org/abs/2403.08979
In this paper, we present a novel fog-aware object detection network called FogGuard, designed to address the challenges posed by foggy weather conditions. Autonomous driving systems heavily rely on accurate object detection algorithms, but adverse weather conditions can significantly impact the reliability of deep neural networks (DNNs). Existing approaches fall into two main categories, 1) image enhancement such as IA-YOLO 2) domain adaptation based approaches. Image enhancement based techniques attempt to generate fog-free image. However, retrieving a fogless image from a foggy image is a much harder problem than detecting objects in a foggy image. Domain-adaptation based approaches, on the other hand, do not make use of labelled datasets in the target domain. Both categories of approaches are attempting to solve a harder version of the problem. Our approach builds over fine-tuning on the Our framework is specifically designed to compensate for foggy conditions present in the scene, ensuring robust performance even. We adopt YOLOv3 as the baseline object detection algorithm and introduce a novel Teacher-Student Perceptual loss, to high accuracy object detection in foggy images. Through extensive evaluations on common datasets such as PASCAL VOC and RTTS, we demonstrate the improvement in performance achieved by our network. We demonstrate that FogGuard achieves 69.43\% mAP, as compared to 57.78\% for YOLOv3 on the RTTS dataset. Furthermore, we show that while our training method increases time complexity, it does not introduce any additional overhead during inference compared to the regular YOLO network.
在本文中,我们提出了一个名为FogGuard的新 fog-aware 物体检测网络,旨在解决雾天气条件下的挑战。自动驾驶系统高度依赖准确的物体检测算法,但恶劣天气条件会显著影响深度神经网络(DNNs)的可靠性。现有的方法可以分为两个主要的类别,1)图像增强,如IA-YOLO 2)基于域的适应方法。基于图像增强的技术试图生成雾中的无雾图像。然而,从雾中图像中检索无雾图像是一个比在雾中检测物体更困难的问题。基于域的适应方法,另一方面,没有使用目标领域内的标记数据。两类方法都在尝试解决一个更难的问题。我们的方法在FogGuard框架上进行了微调,专门针对场景中存在的雾状天气条件,确保 even的性能。我们采用 YOLOv3 作为基线物体检测算法,并引入了一种新的教师-学生感知损失,以实现对雾中图像的高精度物体检测。通过对PASCAL VOC和RTTS等常见数据集的广泛评估,我们证明了我们的网络在性能上的改善。我们证明了FogGuard实现了69.43\%的mAP,而YOLOv3在RTTS数据集上的值为57.78\%。此外,我们还证明了,尽管我们的训练方法增加了运行时复杂性,但在推理过程中并没有引入任何额外的开销,与常规的YOLO网络相比。
https://arxiv.org/abs/2403.08939
Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark regions are amplified by current methods, which results in uncomfortable visual effects for observers. Based on this observation, this study aims at enhancing compressed dark images while avoiding compression artifacts amplification. Since texture details intertwine with compression artifacts in compressed dark images, detail enhancement and blocking artifacts suppression contradict each other in image space. Therefore, we handle the task in latent space. To this end, we propose a novel latent mapping network based on variational auto-encoder (VAE). Firstly, different from previous VAE-based methods with single-resolution features only, we exploit multiple latent spaces with multi-resolution features, to reduce the detail blur and improve image fidelity. Specifically, we train two multi-level VAEs to project compressed dark images and normal-light images into their latent spaces respectively. Secondly, we leverage a latent mapping network to transform features from compressed dark space to normal-light space. Specifically, since the degradation models of darkness and compression are different from each other, the latent mapping process is divided mapping into enlightening branch and deblocking branch. Comprehensive experiments demonstrate that the proposed method achieves state-of-the-art performance in compressed dark image enhancement.
暗图像增强的目的是将暗图像转换为正常光线图像。现有的暗图像增强方法以未压缩的暗图像作为输入,取得了很大的性能。然而,在实践中,在存储或通过互联网传输之前,暗图像通常会被压缩。当前的暗图像处理方法在处理压缩暗图像时表现不佳。被压缩的暗图像中的隐藏 artifacts 被增强,导致观察者产生不舒适的视觉效果。根据这个观察结果,本研究旨在在避免压缩 artifacts 放大的前提下增强压缩暗图像。 由于在压缩暗图像中,纹理细节与压缩 artifacts 在图像空间中交织在一起,因此细节增强和屏蔽 artifacts 抑制在图像空间中是相互矛盾的。因此,我们在潜在空间中处理这个问题。为此,我们提出了一个基于变分自编码器(VAE)的新型潜在映射网络。首先,与仅具有单分辨率特征的之前 VAE 方法不同,我们利用多个潜在空间具有多分辨率特征,以减少细节模糊并提高图像保真度。具体来说,我们训练两个多级 VAE 将压缩暗图像和正常光线图像投影到其潜在空间中。其次,我们利用潜在映射网络将压缩暗空间中的特征转换到正常光线空间。具体来说,由于黑暗和压缩的衰减模型不同,我们将映射过程分为增强明暗分支和抑制阻塞分支。全面的实验证明,与最先进的基于压缩暗图像增强的方法相比,本研究的方法在压缩暗图像增强领域取得了最先进的性能。
https://arxiv.org/abs/2403.07622