Noise, an unwanted component in an image, can be the reason for the degradation of Image at the time of transmission or capturing. Noise reduction from images is still a challenging task. Digital Image Processing is a component of Digital signal processing. A wide variety of algorithms can be used in image processing to apply to an image or an input dataset and obtain important outcomes. In image processing research, removing noise from images before further analysis is essential. Post-noise removal of images improves clarity, enabling better interpretation and analysis across medical imaging, satellite imagery, and radar applications. While numerous algorithms exist, each comes with its own assumptions, strengths, and limitations. The paper aims to evaluate the effectiveness of different filtering techniques on images with eight types of noise. It evaluates methodologies like Wiener, Median, Gaussian, Mean, Low pass, High pass, Laplacian and bilateral filtering, using the performance metric Peak signal to noise ratio. It shows us the impact of different filters on noise models by applying a variety of filters to various kinds of noise. Additionally, it also assists us in determining which filtering strategy is most appropriate for a certain noise model based on the circumstances.
噪声,作为图像中的一个不希望的组成部分,在图像传输或捕获时可能是导致图像质量下降的原因。从图像中去除噪声仍然是一个具有挑战性的任务。数字图像处理是数字信号处理的一个组成部分。在图像处理中,可以使用各种算法应用于图像或输入数据集,并获得重要的结果。在图像处理研究中,对图像进行进一步分析之前去除噪声至关重要。图像去噪后提高了清晰度,从而改善了医疗成像、卫星图像和雷达应用中的解释和分析能力。虽然存在许多算法,但每种算法都有其自己的假设、优势和局限性。 本文旨在评估不同滤波技术在八种类型噪声图像上的有效性。它使用峰值信噪比(PSNR)作为性能指标来评估维纳滤波、中值滤波、高斯滤波、均值滤波、低通滤波、高通滤波、拉普拉斯滤波和双边滤波等方法。通过将不同类型的滤波器应用于各种噪声,展示这些滤波器对噪声模型的影响,并帮助我们确定在特定情况下哪种滤波策略最适合某种噪声模型。
https://arxiv.org/abs/2410.21946
Frequency information (e.g., Discrete Wavelet Transform and Fast Fourier Transform) has been widely applied to solve the issue of Low-Light Image Enhancement (LLIE). However, existing frequency-based models primarily operate in the simple wavelet or Fourier space of images, which lacks utilization of valid global and local information in each space. We found that wavelet frequency information is more sensitive to global brightness due to its low-frequency component while Fourier frequency information is more sensitive to local details due to its phase component. In order to achieve superior preliminary brightness enhancement by optimally integrating spatial channel information with low-frequency components in the wavelet transform, we introduce channel-wise Mamba, which compensates for the long-range dependencies of CNNs and has lower complexity compared to Diffusion and Transformer models. So in this work, we propose a novel Wavelet-based Mamba with Fourier Adjustment model called WalMaFa, consisting of a Wavelet-based Mamba Block (WMB) and a Fast Fourier Adjustment Block (FFAB). We employ an Encoder-Latent-Decoder structure to accomplish the end-to-end transformation. Specifically, WMB is adopted in the Encoder and Decoder to enhance global brightness while FFAB is adopted in the Latent to fine-tune local texture details and alleviate ambiguity. Extensive experiments demonstrate that our proposed WalMaFa achieves state-of-the-art performance with fewer computational resources and faster speed. Code is now available at: this https URL.
频率信息(例如离散小波变换和快速傅里叶变换)已被广泛应用于解决低光照图像增强(LLIE)问题。然而,现有的基于频率的模型主要在简单的图像小波或傅里叶空间中运行,缺乏对每个空间中的有效全局和局部信息的利用。我们发现,由于其低频成分,小波频率信息对全局亮度更加敏感;而由于相位成分的影响,傅里叶频率信息则更关注于局部细节。为了通过最优地结合空间通道信息与小波变换中的低频分量来实现优秀的初步亮度增强,我们引入了逐通道的Mamba模型,该模型弥补了卷积神经网络(CNN)在长程依赖性上的不足,并且相比于扩散模型和Transformer模型具有更低的复杂度。因此,在这项工作中,我们提出了一种基于小波的带有傅里叶调整的新模型,名为WalMaFa,它由一个基于小波的Mamba模块(WMB)和一个快速傅里叶调整块(FFAB)组成。我们采用了编码器-潜层-解码器结构来实现端到端转换。具体来说,在编码器和解码器中采用WMB以增强全局亮度,而在潜层中则使用FFAB来微调局部纹理细节并减少模糊性。广泛的实验表明,我们的WalMaFa模型在计算资源较少且速度更快的情况下达到了最先进的性能水平。代码现已公开:这个 https 链接。
https://arxiv.org/abs/2410.20314
Digital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We highlight transformative applications in image enhancement, filtering techniques, and pattern recognition. By integrating frameworks like the Discrete Fourier Transform (DFT), Z-Transform, and Fourier Transform methods, we enable robust data manipulation and feature extraction essential for AI-driven tasks. Using Python, we implement algorithms that optimize real-time data processing, forming a foundation for scalable, high-performance solutions in computer vision. This work illustrates the potential of ML and DL to advance DSP and DIP methodologies, contributing to artificial intelligence, automated feature extraction, and applications across diverse domains.
数字信号处理(DSP)和数字图像处理(DIP)与机器学习(ML)及深度学习(DL)是计算机视觉及相关领域的热门研究领域。我们强调在图像增强、滤波技术以及模式识别中的变革性应用。通过整合离散傅里叶变换(DFT)、Z变换和傅里叶变换方法等框架,我们能够实现稳健的数据操作和特征提取,这对于AI驱动的任务至关重要。使用Python,我们可以实施优化实时数据处理的算法,为计算机视觉领域提供可扩展、高性能解决方案的基础。这项工作展示了ML和DL提升DSP和DIP方法论的潜力,并对人工智能、自动化特征提取及跨多个领域的应用做出贡献。
https://arxiv.org/abs/2410.20304
Low-light environments pose significant challenges for image enhancement methods. To address these challenges, in this work, we introduce the HUE dataset, a comprehensive collection of high-resolution event and frame sequences captured in diverse and challenging low-light conditions. Our dataset includes 106 sequences, encompassing indoor, cityscape, twilight, night, driving, and controlled scenarios, each carefully recorded to address various illumination levels and dynamic ranges. Utilizing a hybrid RGB and event camera setup. we collect a dataset that combines high-resolution event data with complementary frame data. We employ both qualitative and quantitative evaluations using no-reference metrics to assess state-of-the-art low-light enhancement and event-based image reconstruction methods. Additionally, we evaluate these methods on a downstream object detection task. Our findings reveal that while event-based methods perform well in specific metrics, they may produce false positives in practical applications. This dataset and our comprehensive analysis provide valuable insights for future research in low-light vision and hybrid camera systems.
低光照环境对图像增强方法提出了重大挑战。为了解决这些问题,本研究引入了HUE数据集,这是一个全面收集的高分辨率事件和帧序列的数据集,捕捉了多种多样且具有挑战性的低光条件下的场景。我们的数据集包括106个序列,涵盖了室内、城市景观、黄昏、夜晚、驾驶以及受控环境等多种场景,每个场景都经过精心记录以应对不同的光照水平和动态范围。我们利用混合RGB和事件相机设置收集了一个将高分辨率事件数据与互补帧数据结合的数据集。我们使用无参考度量的标准方法进行了定性和定量评估,来评估最先进的低光增强及基于事件的图像重建方法的表现。此外,我们在下游的目标检测任务上对这些方法进行了评价。我们的研究发现,虽然基于事件的方法在特定指标上的表现良好,但在实际应用中可能会产生误报。这个数据集和我们全面的分析为未来在低光照视觉和混合相机系统方面的研究提供了宝贵的见解。
https://arxiv.org/abs/2410.19164
Diffusion-based Low-Light Image Enhancement (LLIE) has demonstrated significant success in improving the visibility of low-light images. However, the substantial computational burden introduced by the iterative sampling process remains a major concern. Current acceleration methods, whether training-based or training-free, often lead to significant performance degradation. As a result, to achieve an efficient student model with performance comparable to that of existing multi-step teacher model, it is usually necessary to retrain a more capable teacher model. This approach introduces inflexibility, as it requires additional training to enhance the teacher's performance. To address these challenges, we propose \textbf{Re}flectance-aware \textbf{D}iffusion with \textbf{Di}stilled \textbf{T}rajectory (\textbf{ReDDiT}), a step distillation framework specifically designed for LLIE. ReDDiT trains a student model to replicate the teacher's trajectory in fewer steps while also possessing the ability to surpass the teacher's performance. Specifically, we first introduce a trajectory decoder from the teacher model to provide guidance. Subsequently, a reflectance-aware trajectory refinement module is incorporated into the distillation process to enable more deterministic guidance from the teacher model. Our framework achieves comparable performance to previous diffusion-based methods with redundant steps in just 2 steps while establishing new state-of-the-art (SOTA) results with 8 or 4 steps. Comprehensive experimental evaluations on 10 benchmark datasets validate the effectiveness of our method, consistently outperforming existing SOTA methods.
https://arxiv.org/abs/2410.12346
Low-light image enhancement (LLIE) is essential for numerous computer vision tasks, including object detection, tracking, segmentation, and scene understanding. Despite substantial research on improving low-quality images captured in underexposed conditions, clear vision remains critical for autonomous vehicles, which often struggle with low-light scenarios, signifying the need for continuous research. However, paired datasets for LLIE are scarce, particularly for street scenes, limiting the development of robust LLIE methods. Despite using advanced transformers and/or diffusion-based models, current LLIE methods struggle in real-world low-light conditions and lack training on street-scene datasets, limiting their effectiveness for autonomous vehicles. To bridge these gaps, we introduce a new dataset LoLI-Street (Low-Light Images of Streets) with 33k paired low-light and well-exposed images from street scenes in developed cities, covering 19k object classes for object detection. LoLI-Street dataset also features 1,000 real low-light test images for testing LLIE models under real-life conditions. Furthermore, we propose a transformer and diffusion-based LLIE model named "TriFuse". Leveraging the LoLI-Street dataset, we train and evaluate our TriFuse and SOTA models to benchmark on our dataset. Comparing various models, our dataset's generalization feasibility is evident in testing across different mainstream datasets by significantly enhancing images and object detection for practical applications in autonomous driving and surveillance systems. The complete code and dataset is available on this https URL.
https://arxiv.org/abs/2410.09831
Due to the nature of enhancement--the absence of paired ground-truth information, high-level vision tasks have been recently employed to evaluate the performance of low-light image enhancement. A widely-used manner is to see how accurately an object detector trained on enhanced low-light images by different candidates can perform with respect to annotated semantic labels. In this paper, we first demonstrate that the mentioned approach is generally prone to overfitting, and thus diminishes its measurement reliability. In search of a proper evaluation metric, we propose LIME-Bench, the first online benchmark platform designed to collect human preferences for low-light enhancement, providing a valuable dataset for validating the correlation between human perception and automated evaluation metrics. We then customize LIME-Eval, a novel evaluation framework that utilizes detectors pre-trained on standard-lighting datasets without object annotations, to judge the quality of enhanced images. By adopting an energy-based strategy to assess the accuracy of output confidence maps, our LIME-Eval can simultaneously bypass biases associated with retraining detectors and circumvent the reliance on annotations for dim images. Comprehensive experiments are provided to reveal the effectiveness of our LIME-Eval. Our benchmark platform (this https URL) and code (this https URL) are available online.
https://arxiv.org/abs/2410.08810
The purpose of this work is to investigate the soundness and utility of a neural network-based approach as a framework for exploring the impact of image enhancement techniques on visual cortex activation. In a preliminary study, we prepare a set of state-of-the-art brain encoding models, selected among the top 10 methods that participated in The Algonauts Project 2023 Challenge [16]. We analyze their ability to make valid predictions about the effects of various image enhancement techniques on neural responses. Given the impossibility of acquiring the actual data due to the high costs associated with brain imaging procedures, our investigation builds up on a series of experiments. Specifically, we analyze the ability of brain encoders to estimate the cerebral reaction to various augmentations by evaluating the response to augmentations targeting objects (i.e., faces and words) with known impact on specific areas. Moreover, we study the predicted activation in response to objects unseen during training, exploring the impact of semantically out-of-distribution stimuli. We provide relevant evidence for the generalization ability of the models forming the proposed framework, which appears to be promising for the identification of the optimal visual augmentation filter for a given task, model-driven design strategies as well as for AR and VR applications.
本文旨在探讨基于神经网络的方法作为一种研究图像增强技术对视觉皮层激活影响的框架的可靠性和实用性。在初步研究中,我们从Algonauts项目2023挑战[16]中选取了10个最先进的大脑编码模型,以评估它们对各种图像增强技术对神经响应的影响。我们分析它们对各种增强技术的有效预测能力。由于脑成像过程的费用,获取实际数据是不可能的,因此我们的研究建立在一系列实验基础上。具体来说,我们评估了大脑编码器对已知对特定区域产生影响的物体(即面部和单词)的反应能力。此外,我们还研究了未在训练过程中见过的物体引起的预测激活,探讨了形式上分布在外观上的刺激对模型的影响。我们提供了用于说明所提出的框架模型的一般化能力的证据,该框架在识别为特定任务的最佳视觉增强滤波器方面似乎具有前景,以及模型驱动的设计策略和AR和VR应用。
https://arxiv.org/abs/2410.04497
Novel view synthesis (NVS) aims to generate images at arbitrary viewpoints using multi-view images, and recent insights from neural radiance fields (NeRF) have contributed to remarkable improvements. Recently, studies on generalizable NeRF (G-NeRF) have addressed the challenge of per-scene optimization in NeRFs. The construction of radiance fields on-the-fly in G-NeRF simplifies the NVS process, making it well-suited for real-world applications. Meanwhile, G-NeRF still struggles in representing fine details for a specific scene due to the absence of per-scene optimization, even with texture-rich multi-view source inputs. As a remedy, we propose a Geometry-driven Multi-reference Texture transfer network (GMT) available as a plug-and-play module designed for G-NeRF. Specifically, we propose ray-imposed deformable convolution (RayDCN), which aligns input and reference features reflecting scene geometry. Additionally, the proposed texture preserving transformer (TP-Former) aggregates multi-view source features while preserving texture information. Consequently, our module enables direct interaction between adjacent pixels during the image enhancement process, which is deficient in G-NeRF models with an independent rendering process per pixel. This addresses constraints that hinder the ability to capture high-frequency details. Experiments show that our plug-and-play module consistently improves G-NeRF models on various benchmark datasets.
用于生成任意视角下的图像的多视角图像,并受到来自神经辐射场(NeRF)的最近见解的显著改进。最近,在可扩展NeRF(G-NeRF)的研究中,解决了NeRFs中场景优化的问题。在G-NeRF中,在运行时构建辐射场简化了NVS过程,使其非常适合现实世界的应用。然而,由于缺乏场景优化,G-NeRF在表示特定场景的细小细节方面仍然遇到困难,即使文本丰富多视角输入存在。为了克服这一问题,我们提出了一个面向G-NeRF的Geometry驱动多参考纹理传输网络(GMT)作为可插拔模块。具体来说,我们提出了ray-imposed deformable convolution(RayDCN),它将输入和参考特征对齐,并保留场景几何信息。此外,所提出的纹理保留Transformer(TP-Former)在聚合多视角源特征的同时保留纹理信息。因此,我们的模块在图像增强过程中可以直接交互相邻像素,而G-NeRF模型在独立渲染过程中每个像素的渲染限制。这解决了阻碍捕捉高频细节的能力的问题。实验证明,我们的可插拔模块在各种基准数据集上 consistently改善了G-NeRF模型。
https://arxiv.org/abs/2410.00672
Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our approach involves recovering an underwater image to a reference enhanced image through supervised training and decomposing it into color and content codes via self-reconstruction and cross-reconstruction. The color code is explicitly constrained to follow a Gaussian distribution, allowing for efficient sampling and interpolation during inference. ColorCode offers three key features: 1) color enhancement, producing an enhanced image with a fixed color; 2) color adaptation, enabling controllable adjustments of long-wavelength color components using guidance images; and 3) color interpolation, allowing for the smooth generation of multiple colors through continuous sampling of the color code. Quantitative and visual evaluations on popular and challenging benchmark datasets demonstrate the superiority of ColorCode over existing methods in providing diverse, controllable, and color-realistic enhancement results. The source code is available at this https URL.
由于吸收和散射效应,水下图像通常质量低下。现有的水下图像增强算法产生单色固定颜色图像,限制了用户灵活性和应用。为了应对这一限制,我们提出了一个名为ColorCode的方法,通过有监督训练提高水下图像,并具有可控制的颜色输出范围。我们的方法涉及通过自重建和跨重建将水下图像恢复到参考增强图像,并通过正则化约束颜色码遵循高斯分布。ColorCode具有三个关键特点:1)颜色增强,生成具有固定颜色的增强图像;2)颜色适应,使用指导图像控制长波长颜色分量的可控调整;3)颜色插值,通过连续采样颜色码生成多个颜色。对于流行和具有挑战性的基准数据集的定量和视觉评估显示,ColorCode相对于现有方法在提供多样、可控制和色彩真实的增强结果方面具有优越性。源代码可在此处下载:https://www.colorcode-method.com/
https://arxiv.org/abs/2409.19685
Majority of deep learning methods utilize vanilla convolution for enhancing underwater images. While vanilla convolution excels in capturing local features and learning the spatial hierarchical structure of images, it tends to smooth input images, which can somewhat limit feature expression and modeling. A prominent characteristic of underwater degraded images is blur, and the goal of enhancement is to make the textures and details (high-frequency features) in the images more visible. Therefore, we believe that leveraging high-frequency features can improve enhancement performance. To address this, we introduce Pixel Difference Convolution (PDC), which focuses on gradient information with significant changes in the image, thereby improving the modeling of enhanced images. We propose an underwater image enhancement network, PDCFNet, based on PDC and cross-level feature fusion. Specifically, we design a detail enhancement module based on PDC that employs parallel PDCs to capture high-frequency features, leading to better detail and texture enhancement. The designed cross-level feature fusion module performs operations such as concatenation and multiplication on features from different levels, ensuring sufficient interaction and enhancement between diverse features. Our proposed PDCFNet achieves a PSNR of 27.37 and an SSIM of 92.02 on the UIEB dataset, attaining the best performance to date. Our code is available at this https URL.
大多数深度学习方法都利用卷积神经网络(CNN)来增强水下图像。虽然传统的卷积神经网络在捕捉局部特征和学习图像的空间层次结构方面表现出色,但它往往会使输入图像平滑,这可能会限制特征表达和建模。水下退化图像的一个突出特点就是模糊,增强图像的目标就是使图像中的纹理和细节(高频特征)更加明显。因此,我们相信利用高频特征可以提高增强性能。为了应对这个问题,我们引入了像素差分卷积(PDC),它关注于图像中显著变化的部分,从而提高增强图像的建模。我们提出了基于PDC和跨层特征融合的 underwater 图像增强网络 PDCFNet。具体来说,我们基于PDC设计了一个细节增强模块,采用并行PDC来捕捉高频特征,从而实现更好的细节和纹理增强。设计的跨层特征融合模块对不同级别的特征进行操作,确保不同特征之间的足够交互和增强。我们的 PDCFNet 在 UIEB 数据集上实现了 PSNR 27.37 和 SSIM 92.02,实现了目前最佳的性能。我们的代码可以从该链接获取:https://www.kaggle.com/xiaoling22/uieb-dataset
https://arxiv.org/abs/2409.19269
Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework based on diffusion priors and lookup tables (DPLUT) to achieve efficient low-light image recovery. The proposed approach comprises two critical components: a light adjustment lookup table (LLUT) and a noise suppression lookup table (NLUT). LLUT is optimized with a set of unsupervised losses. It aims at predicting pixel-wise curve parameters for the dynamic range adjustment of a specific image. NLUT is designed to remove the amplified noise after the light brightens. As diffusion models are sensitive to noise, diffusion priors are introduced to achieve high-performance noise suppression. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in terms of visual quality and efficiency.
低光图像增强(LIE)旨在精确有效地恢复在欠光照环境下受损的图像。最近,先进的LIE技术采用深度神经网络,这需要大量的低正常光图像对、网络参数和计算资源。因此,它们的实用性受到限制。在本文中,我们提出了一种基于扩散优先权和查找表(DPLUT)的新型无监督LIE框架,以实现高效的低光图像恢复。所提出的方法包括两个关键组件:一个光调整查找表(LLUT)和一个噪声抑制查找表(NLUT)。LLUT通过一系列无监督损失进行优化。它旨在预测特定图像的动态范围调整过程中的像素级曲线参数。NLUT在光变亮后设计用于消除放大噪声。由于扩散模型对噪声敏感,我们引入了扩散优先权来实现高性能的噪声抑制。大量实验证明,我们的方法在视觉质量和效率方面超过了最先进的方法。
https://arxiv.org/abs/2409.18899
Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: this https URL
水下视觉对于自主水下车辆(AUV)至关重要,在资源受限的AUV上实时提高降解的海底图像是一个关键挑战,由于诸如光吸收和散射等因素,或者因为足够模型计算复杂性来解决这些因素。传统的图像增强技术对不同水下条件缺乏适应性,而基于学习的特别是使用卷积神经网络(CNN)和生成对抗网络(GAN)的方法提供更健壮的解决方案,但存在诸如增强不足、训练不稳定或模态崩溃等限制。去噪扩散概率模型(DDPM)已成为图像到图像任务的顶尖方法,但使用最近的水下UW-DDPM解决方案来实现所需的水下图像增强(UIE)需要大量的计算复杂性。为了应对这些挑战,本文引入了 UW-DiffPhys,一种基于物理和扩散的图像到图像UIE方法。 UW-DiffPhys 结合了基于光计算的物理基UIE网络组件和一个去噪U-Net,用现有的 UW-DDPM 框架中的计算密集型分布转换 U-Net 替换,从而降低复杂性,同时保持性能。此外,还采用了去噪扩散隐式模型(DDIM)来加速推理过程。 实验结果表明,与 UW-DDPM 相比, UW-DiffPhys 在计算复杂性和推理时间方面取得了显着减少,同时在关键指标如 PSNR、SSIM、UCIQE 和总体水下图像质量指标 UIQM 上具有竞争力的性能。实现代码可在此仓库中找到:https://this URL。
https://arxiv.org/abs/2409.18476
Cone Beam Computed Tomography (CBCT) finds diverse applications in medicine. Ensuring high image quality in CBCT scans is essential for accurate diagnosis and treatment delivery. Yet, the susceptibility of CBCT images to noise and artifacts undermines both their usefulness and reliability. Existing methods typically address CBCT artifacts through image-to-image translation approaches. These methods, however, are limited by the artifact types present in the training data, which may not cover the complete spectrum of CBCT degradations stemming from variations in imaging protocols. Gathering additional data to encompass all possible scenarios can often pose a challenge. To address this, we present SinoSynth, a physics-based degradation model that simulates various CBCT-specific artifacts to generate a diverse set of synthetic CBCT images from high-quality CT images without requiring pre-aligned data. Through extensive experiments, we demonstrate that several different generative networks trained on our synthesized data achieve remarkable results on heterogeneous multi-institutional datasets, outperforming even the same networks trained on actual data. We further show that our degradation model conveniently provides an avenue to enforce anatomical constraints in conditional generative models, yielding high-quality and structure-preserving synthetic CT images.
Cone Beam Computed Tomography (CBCT) 在医学领域有多种应用。确保 CBCT 扫描具有高图像质量对准确诊断和治疗至关重要。然而,CBCT 图像对噪声和伪影的易受性削弱了其有用性和可靠性。现有的方法通常通过图像到图像的映射方法来解决 CBCT 伪影。然而,这些方法仅限于训练数据中存在的伪影类型,这可能无法涵盖由于成像方案变化引起的 CBCT 伪影的完整范围。收集更多数据以涵盖所有可能场景往往具有挑战性。为解决这个问题,我们提出了 SinoSynth,一种基于物理的降解模型,通过模拟各种 CBCT 特有的伪影来生成一系列高质量的合成 CBCT 图像,而无需事先进行对齐的数据。通过大量实验,我们证明了几个不同生成网络在我们的合成数据上训练,在异质多机构数据集上取得了显著的进步,甚至超过了训练在实际数据上的相同网络。我们进一步表明,我们的降解模型为在条件生成模型中实施解剖约束提供了一个途径,产生了高质量和结构保留的合成 CT 图像。
https://arxiv.org/abs/2409.18355
Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics.
尽管在最近低光图像增强技术方面取得了显著的进步,但配对数据的缺乏已经成为了进一步进步的一个显著障碍。本文提出了一种以平均教师为基础的半监督低光增强(Semi-LLIE)框架,将未配对数据整合到模型训练中。平均教师技术是一种著名的半监督学习方法,已成功应用于解决高级和低级视觉任务。然而,两个主要问题阻碍了 naive 平均教师方法在低光图像增强方面的最佳性能。首先,像素级的一致性损失不足以将教师模型的真实光照分布从学生模型中传输到学生模型,导致增强图像出现色彩偏差。其次,由于它们倾向于忽视在局部区域内建模结构信息,前沿图像增强方法无法有效地与平均教师框架合作来恢复暗区详细信息。为了减轻上述问题,我们首先引入了一种语义感知对比损失,以忠实传输光照分布,有助于增强具有自然颜色的图像。然后,我们设计了一个基于Mamba的低光图像增强骨干网络,通过多尺度特征学习方案有效地增强Mamba的局部区域像素关系表示能力,促进生成具有丰富纹理细度的图像。此外,我们提出了基于大型视觉语言识别模型(RAM)的大规模视觉语言感知损失,以帮助生成具有更丰富文本详细信息的增强图像。实验结果表明,我们的 Semi-LLIE 在数量和质量指标上超越了现有方法。
https://arxiv.org/abs/2409.16604
Adversarial attacks in computer vision exploit the vulnerabilities of machine learning models by introducing subtle perturbations to input data, often leading to incorrect predictions or classifications. These attacks have evolved in sophistication with the advent of deep learning, presenting significant challenges in critical applications, which can be harmful for society. However, there is also a rich line of research from a transformative perspective that leverages adversarial techniques for social good. Specifically, we examine the rise of proactive schemes-methods that encrypt input data using additional signals termed templates, to enhance the performance of deep learning models. By embedding these imperceptible templates into digital media, proactive schemes are applied across various applications, from simple image enhancements to complicated deep learning frameworks to aid performance, as compared to the passive schemes, which don't change the input data distribution for their framework. The survey delves into the methodologies behind these proactive schemes, the encryption and learning processes, and their application to modern computer vision and natural language processing applications. Additionally, it discusses the challenges, potential vulnerabilities, and future directions for proactive schemes, ultimately highlighting their potential to foster the responsible and secure advancement of deep learning technologies.
计算机视觉中的对抗性攻击通过引入微小的输入数据扰动来利用机器学习模型的漏洞,通常导致错误的预测或分类。随着深度学习的出现,这些攻击的复杂性已经得到了极大的提升,为关键应用带来了显著的挑战,这些挑战对社会有害。然而,从变革性的角度来看,也有一系列研究利用对抗性技术来促进社会公益。具体来说,我们检查了主动计划方法的出现,这些方法使用额外的信号称为模板来加密输入数据,以提高深度学习模型的性能。通过将这些无形模板嵌入数字媒体中,主动计划方法在各种应用中得到应用,从简单的图像增强到复杂的深度学习框架,以帮助提高性能,与被动计划相比,这些主动计划不改变输入数据的分布。调查深入研究了这些主动计划的背后方法、加密和学习过程以及它们应用于现代计算机视觉和自然语言处理应用程序的事实。此外,还讨论了主动计划面临的挑战、潜在漏洞以及未来的发展方向,最终突出了它们推动深度学习技术负责任和安全发展的潜力。
https://arxiv.org/abs/2409.16491
Low-light images are commonly encountered in real-world scenarios, and numerous low-light image enhancement (LLIE) methods have been proposed to improve the visibility of these images. The primary goal of LLIE is to generate clearer images that are more visually pleasing to humans. However, the impact of LLIE methods in high-level vision tasks, such as image classification and object detection, which rely on high-quality image datasets, is not well {explored}. To explore the impact, we comprehensively evaluate LLIE methods on these high-level vision tasks by utilizing an empirical investigation comprising image classification and object detection experiments. The evaluation reveals a dichotomy: {\textit{While Low-Light Image Enhancement (LLIE) methods enhance human visual interpretation, their effect on computer vision tasks is inconsistent and can sometimes be harmful. }} Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis, indicating a need for LLIE methods tailored to support high-level vision tasks effectively. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision.
低光图像在现实场景中非常常见,为了改善这些图像的可见度,已经提出了许多低光图像增强(LLIE)方法。LLIE的主要目标是为人类产生更清晰、更美观的图像。然而,LLIE方法在高级视觉任务(如图像分类和目标检测)中的影响尚不明确。为了探索这个问题,我们通过使用图像分类和目标检测实验,全面评估LLIE方法在这些高级视觉任务上的效果。评估结果揭示了二分法:虽然LLIE方法可以增强人类视觉解释,但对于计算机视觉任务,它们的效果不统一,甚至有时可能是有害的。我们的研究结果表明,图像增强对于人类视觉理解和机器分析之间存在脱节,表明为高层次视觉任务定制专门的LLIE方法至关重要。这个见解对于开发既符合人类又符合机器视觉需求的LLIE技术具有关键意义。
https://arxiv.org/abs/2409.14461
We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train a robust and flexible diffusion-based image enhancement network that is highly effective as a stand-alone method, unlike previous diffusion model-based approaches which act only as a refiner on top of pre-trained models. Through extensive experiments, we show that FD3 establishes \add{superior quality} not only on synthetic degradations but also on in vivo studies with low-quality fundus photos taken from patients with cataracts or small pupils. To promote further research in this area, we open-source all our code and data used for this research at this https URL
我们提出了FD3,一种基于直接扩散桥的 fundus 图像增强方法,可以应对广泛的复杂退化,包括雾、模糊、噪声和阴影。我们首先通过与认证的眼科医生进行人类反馈环,提出了一种合成前向模型,以最大程度地提高低质量活检图像的质量。通过所提出的 forward 模型,我们训练了一个健壮且灵活的扩散-based 图像增强网络,作为单独的方法,有效性和之前基于扩散模型的方法相比,可以在很大程度上提升。通过大量实验,我们证明了 FD3 在不仅限于合成退化,而且在来自患有一定程度白内障或小瞳孔病人体内的活检图像上,都取得了卓越的增强效果。为了进一步研究这个领域,我们在此处公开了所有用于这项研究的代码和数据,地址为 https://url。
https://arxiv.org/abs/2409.12377
Ultrasound imaging, despite its widespread use in medicine, often suffers from various sources of noise and artifacts that impact the signal-to-noise ratio and overall image quality. Enhancing ultrasound images requires a delicate balance between contrast, resolution, and speckle preservation. This paper introduces a novel approach that integrates adaptive beamforming with denoising diffusion-based variance imaging to address this challenge. By applying Eigenspace-Based Minimum Variance (EBMV) beamforming and employing a denoising diffusion model fine-tuned on ultrasound data, our method computes the variance across multiple diffusion-denoised samples to produce high-quality despeckled images. This approach leverages both the inherent multiplicative noise of ultrasound and the stochastic nature of diffusion models. Experimental results on a publicly available dataset demonstrate the effectiveness of our method in achieving superior image reconstructions from single plane-wave acquisitions. The code is available at: this https URL.
超声成像,尽管在医学领域得到了广泛应用,但经常受到各种噪声和伪像的影响,从而影响信号与噪声比和整体图像质量。提高超声图像质量需要在一个对contrast(对比度)、resolution(分辨率)和speckle preservation(伪像保持)之间的微调取得平衡。本文介绍了一种将自适应波形形成与基于扩散的伪像成像相结合的新方法,以解决这一挑战。通过应用基于Eigenspace的最小方差(EBMV)波形形成和利用在超声数据上微调的denoising diffusion模型,我们的方法计算了多个扩散去噪样本之间的方差,从而产生了高质量的去噪图像。这种方法利用了超声的固有乘法噪声和扩散模型的随机性质。公开可用数据集上的实验结果表明,我们的方法能够从单光子测距中实现卓越的图像重构。代码可在此处下载:https://this URL。
https://arxiv.org/abs/2409.11380
Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schrödinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schrödinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at \url{this https URL}.
视网膜 fundus 摄影在诊断和监测视网膜疾病方面具有重要意义。然而,全身不完善和操作者/患者相关因素可能阻碍高质量视网膜图像的获取。先前对视网膜图像增强的努力主要依赖于 GAN,它们的训练稳定性与输出多样性之间存在权衡。相比之下,Schrödinger Bridge(SB)通过利用最优传输(OT)理论来建模两个任意分布之间的随机微分方程(SDE),提供了一个更稳定的解决方案。这使得SB能够有效地将低质量的视网膜图像转换为高质量的同类。 在这项工作中,我们利用SB框架提出了一个图像到图像的视网膜图像增强管道。此外,以前的方法通常无法捕捉到细结构细节,如血管。为了解决这个问题,我们通过引入动态蛇卷积来增强我们的管道,该卷积的曲折的接收场可以更好地保留管状结构。我们将这种增强后的视网膜 fundus 图像命名为“上下文感知无配对神经 Schrödinger Bridge”(CUNSB-RFIE)。据我们所知,这是第一个使用SB方法进行视网膜图像增强的尝试。在一大型数据集上的实验结果表明,与几种最先进的监督和无监督方法相比,所提出方法在图像质量和下游任务上的性能具有优势。 代码可在此处访问:\url{this <https:// URL>}。
https://arxiv.org/abs/2409.10966