The usage of digital content (photos and videos) in a variety of applications has increased due to the popularity of multimedia devices. These uses include advertising campaigns, educational resources, and social networking platforms. There is an increasing need for high-quality graphic information as people become more visually focused. However, captured images frequently have poor visibility and a high amount of noise due to the limitations of image-capturing devices and lighting conditions. Improving the visual quality of images taken in low illumination is the aim of low-illumination image enhancement. This problem is addressed by traditional image enhancement techniques, which alter noise, brightness, and contrast. Deep learning-based methods, however, have dominated recently made advances in this area. These methods have effectively reduced noise while preserving important information, showing promising results in the improvement of low-illumination images. An extensive summary of image signal processing methods for enhancing low-illumination images is provided in this paper. Three categories are classified in the review for approaches: hybrid techniques, deep learning-based methods, and traditional approaches. Conventional techniques include denoising, automated white balancing, and noise reduction. Convolutional neural networks (CNNs) are used in deep learningbased techniques to recognize and extract characteristics from low-light images. To get better results, hybrid approaches combine deep learning-based methodologies with more conventional methods. The review also discusses the advantages and limitations of each approach and provides insights into future research directions in this field.
数字内容(照片和视频)在多媒体设备普及的推动下,在各种应用程序中的使用量不断增加。这些应用包括广告活动、教育资源和社会网络平台。随着人们越来越注重视觉效果,对高质量图形信息的需求也在增加。然而,由于图像捕捉设备和光照条件的限制,捕获到的图片往往可见度低且噪声大。改善在弱光条件下拍摄的照片的视觉质量是低光照图像增强的目标。传统图像增强技术通过调整噪音、亮度和对比度来解决这个问题。不过,在这个领域最近的进步主要由基于深度学习的方法主导。这些方法有效地减少了噪声,同时保留了重要信息,并展示了改进低照明图像的巨大潜力。 本文提供了一种广泛的总结,概述用于提高低光照图像质量的图像信号处理方法。根据这种方法,三种分类被归入到回顾中:混合技术、基于深度学习的方法和传统方法。传统方法包括去噪、自动白平衡以及噪声减少等手段。在基于深度学习的技术中,则使用卷积神经网络(CNN)来识别并提取低光环境下图像的特征。为了获得更好的结果,混合方法将基于深度学习的方法与更传统的技术相结合。 回顾还讨论了每种方法的优势和局限性,并为未来的研究方向提供了见解。
https://arxiv.org/abs/2502.05995
Fingerprint recognition remains one of the most reliable biometric technologies due to its high accuracy and uniqueness. Traditional systems rely on contact-based scanners, which are prone to issues such as image degradation from surface contamination and inconsistent user interaction. To address these limitations, contactless fingerprint recognition has emerged as a promising alternative, providing non-intrusive and hygienic authentication. This study evaluates the impact of image enhancement tech-niques on the performance of pre-trained deep learning models using transfer learning for touchless fingerprint recognition. The IIT-Bombay Touchless and Touch-Based Fingerprint Database, containing data from 200 subjects, was employed to test the per-formance of deep learning architectures such as VGG-16, VGG-19, Inception-V3, and ResNet-50. Experimental results reveal that transfer learning methods with fingerprint image enhance-ment (indirect method) significantly outperform those without enhancement (direct method). Specifically, VGG-16 achieved an accuracy of 98% in training and 93% in testing when using the enhanced images, demonstrating superior performance compared to the direct method. This paper provides a detailed comparison of the effectiveness of image enhancement in improving the accuracy of transfer learning models for touchless fingerprint recognition, offering key insights for developing more efficient biometric systems.
指纹识别由于其高准确性和独特性,仍然是最可靠的生物识别技术之一。传统的系统依赖于接触式扫描仪,这种设备容易受到表面污染导致图像退化以及用户互动不一致等问题的影响。为了解决这些问题,非接触式指纹识别作为一种有前途的替代方案应运而生,它提供了无侵犯且卫生的身份认证方式。本研究评估了图像增强技术对使用迁移学习预训练深度学习模型进行非接触式指纹识别性能的影响。实验中采用IIT-孟买非接触和接触式指纹数据库,该数据库包含200名受试者的数据,并测试了VGG-16、VGG-19、Inception-V3 和 ResNet-50等深度学习架构的性能。 实验证明,使用指纹图像增强技术(间接方法)的迁移学习方法显著优于未进行增强处理的方法(直接方法)。具体而言,在使用增强后的图像时,VGG-16模型在训练中达到了98%的准确率,在测试中的准确率为93%,这表明其性能明显优于直接方法。本文详细比较了图像增强技术在提升非接触式指纹识别迁移学习模型准确性方面的有效性,并为开发更高效的生物识别系统提供了关键见解。
https://arxiv.org/abs/2502.04680
Measuring the connectivity of water in rivers and streams is essential for effective water resource management. Increased extreme weather events associated with climate change can result in alterations to river and stream connectivity. While traditional stream flow gauges are costly to deploy and limited to large river bodies, trail camera methods are a low-cost and easily deployed alternative to collect hourly data. Image capturing, however requires stream ecologists to manually curate (select and label) tens of thousands of images per year. To improve this workflow, we developed an automated instream trail camera image classification system consisting of three parts: (1) image processing, (2) image augmentation and (3) machine learning. The image preprocessing consists of seven image quality filters, foliage-based luma variance reduction, resizing and bottom-center cropping. Images are balanced using variable amount of generative augmentation using diffusion models and then passed to a machine learning classification model in labeled form. By using the vision transformer architecture and temporal image enhancement in our framework, we are able to increase the 75% base accuracy to 90% for a new unseen site image. We make use of a dataset captured and labeled by staff from the Connecticut Department of Energy and Environmental Protection between 2018-2020. Our results indicate that a combination of temporal image processing and attention-based models are effective at classifying unseen river connectivity images.
测量河流和溪流中水的连通性对于有效的水资源管理至关重要。与气候变化相关的极端天气事件增多可能会导致河流和溪流的连通性发生变化。虽然传统的流量监测设备部署成本高且仅限于大型河体,但使用追踪相机方法是一种低成本且易于部署的替代方案,可以收集每小时的数据。然而,图像捕捉需要溪流生态学家每年手动整理(选择并标注)成千上万张图片。 为了改进这一工作流程,我们开发了一个自动化的河中追踪相机图像分类系统,包括三个部分:(1) 图像处理;(2) 图像增强和 (3) 机器学习。图像预处理由七种图像质量过滤器组成、基于植被的亮度差异减少、调整大小以及底部居中的裁剪操作。使用生成式增强技术(通过扩散模型)以不同的程度平衡数据集,然后将标记后的图片传递给机器学习分类模型。借助我们的框架中使用的视觉变压器架构和时间图像改进技术,我们能够提高基础准确率75%,对于新未见过的地点图像达到90%。 我们利用了由美国康涅狄格州能源与环境部员工在2018年至2020年间捕获并标注的数据集。我们的研究结果表明,结合时间序列图像处理和基于注意力的模型可以有效地分类未见过的河流连通性图片。
https://arxiv.org/abs/2502.00474
We introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the segmentation model. Two variants of GRN were also developed, including GRN for sample-efficient learning (GRN-SEL) and GRN for semi-supervised learning (GRN-SSL). GRN's performance was evaluated using a dataset of 69 fully annotated 3D ultrasound scans from 29 subjects. The annotations included six anatomical structures: dermis, superficial fat, superficial fascial membrane (SFM), deep fat, deep fascial membrane (DFM), and muscle. Our results show that GRN-SEL with SGE reduces labeling efforts by up to 70% while achieving a 1.98% improvement in the Dice Similarity Coefficient (DSC) compared to models trained on fully labeled datasets. GRN-SEL alone reduces labeling efforts by 60%, GRN-SSL with SGE decreases labeling requirements by 70%, and GRN-SSL alone by 60%, all while maintaining performance comparable to fully supervised models. These findings suggest the effectiveness of the GRN framework in optimizing segmentation performance with significantly less labeled data, offering a scalable and efficient solution for ultrasound image analysis and reducing the burdens associated with data annotation.
我们介绍了一种称为生成强化网络(GRN)的新型分段感知联合训练框架,该框架将分割损失反馈集成到单个阶段中,以同时优化图像生成和分割性能。还开发了一种名为分段引导增强(SGE)的图像增强技术,在此技术中,生成器会针对特定的分割模型产生专门定制的图像。此外,还开发了GRN的两种变体:用于样本高效学习的GRN (GRN-SEL) 和用于半监督学习的GRN (GRN-SSL)。 GRN 的性能使用来自29名受试者的69份完整注释3D超声扫描数据集进行了评估,这些注释包括六个解剖结构:真皮、浅层脂肪、浅层筋膜膜(SFM)、深层脂肪、深层筋膜膜(DFM)和肌肉。我们的研究结果表明,在与在完全标注的数据集上训练的模型相比,带有SGE的GRN-SEL能够将标记工作量减少高达70%,同时使Dice相似性系数(DSC)提高了1.98%。GRN-SEL本身可以减少60%的标签努力,而带有SGE的GRN-SSL减少了70%的标注需求,单独的GRN-SSL则减少了60%的需求,所有这些情况均保持了与完全监督模型相当的表现水平。 这些发现表明,GRN框架在使用显著较少的标记数据优化分割性能方面非常有效,并为超声图像分析提供了可扩展且高效的解决方案,同时减轻了数据标注相关的负担。
https://arxiv.org/abs/2501.17690
Image restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (\textit{e.g.}, 4K and 8K), achieving a balance between restoration quality and computational efficiency has become increasingly critical. Existing methods, primarily based on CNNs, Transformers, or their hybrid approaches, apply uniform deep representation extraction across the image. However, these methods often struggle to effectively model long-range dependencies and largely overlook the spatial characteristics of image degradation (regions with richer textures tend to suffer more severe damage), making it hard to achieve the best trade-off between restoration quality and efficiency. To address these issues, we propose a novel texture-aware image restoration method, TAMambaIR, which simultaneously perceives image textures and achieves a trade-off between performance and efficiency. Specifically, we introduce a novel Texture-Aware State Space Model, which enhances texture awareness and improves efficiency by modulating the transition matrix of the state-space equation and focusing on regions with complex textures. Additionally, we design a {Multi-Directional Perception Block} to improve multi-directional receptive fields while maintaining low computational overhead. Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency, establishing it as a robust and efficient framework for image restoration.
图像恢复的目标是还原受损图像中的细节并增强对比度。随着对高质量成像(例如4K和8K)需求的增长,实现修复质量和计算效率之间的平衡变得越来越关键。现有方法主要基于卷积神经网络(CNNs)、变换器(Transformer),或两者的混合方法,在整个图像上应用统一的深度表示提取技术。然而,这些方法往往难以有效地建模长程依赖关系,并且严重忽视了图像降质的空间特性(纹理更丰富的区域通常遭受更严重的损坏),从而很难在修复质量和效率之间取得最佳平衡。 为了应对这些问题,我们提出了一种新型的基于纹理感知的图像恢复方法——TAMambaIR。该方法能够同时感知图像中的纹理特征并在性能与计算效率之间实现权衡。具体来说,我们引入了一个新颖的“基于纹理的状态空间模型”(Texture-Aware State Space Model),通过调节状态空间方程的转换矩阵来增强对纹理的认知,并专注于复杂纹理区域以提高效率。此外,我们设计了一种“多向感知模块”,旨在改善图像的多方向感受野,同时保持较低的计算开销。 在超分辨率、去雨和低光照图像增强等基准测试上的广泛实验表明,TAMambaIR能够实现最先进的性能,并且显著提高了效率,从而确立了其作为稳健而高效的图像恢复框架的地位。
https://arxiv.org/abs/2501.16583
Activities in underwater environments are paramount in several scenarios, which drives the continuous development of underwater image enhancement techniques. A major challenge in this domain is the depth at which images are captured, with increasing depth resulting in a darker environment. Most existing methods for underwater image enhancement focus on noise removal and color adjustment, with few works dedicated to brightness enhancement. This work introduces a novel unsupervised learning approach to underwater image enhancement using a diffusion model. Our method, called UDBE, is based on conditional diffusion to maintain the brightness details of the unpaired input images. The input image is combined with a color map and a Signal-Noise Relation map (SNR) to ensure stable training and prevent color distortion in the output images. The results demonstrate that our approach achieves an impressive accuracy rate in the datasets UIEB, SUIM and RUIE, well-established underwater image benchmarks. Additionally, the experiments validate the robustness of our approach, regarding the image quality metrics PSNR, SSIM, UIQM, and UISM, indicating the good performance of the brightness enhancement process. The source code is available here: this https URL.
水下环境中的活动在多个场景中至关重要,这推动了持续发展水下图像增强技术的需求。该领域的主要挑战在于拍摄图像的深度,随着深度增加,光线变暗。大多数现有的水下图像增强方法主要集中在噪声去除和色彩调整上,而很少有研究致力于亮度提升。这项工作介绍了一种使用扩散模型进行水下图像增强的新颖无监督学习方法。我们的方法称为UDBE(Underwater Diffusion Brightness Enhancement),基于条件扩散来保持未配对输入图像的亮度细节。通过将输入图像与色彩图和信噪比图(SNR)相结合,确保了稳定训练并防止输出图像中的色彩失真。 实验结果表明,在UIEB、SUIM和RUIE这些公认的水下图像基准数据集上,我们的方法取得了令人印象深刻的准确率。此外,根据图像质量指标PSNR(峰值信噪比)、SSIM(结构相似性指数)、UIQM(综合水下图象质量度量)和UISM(水下图像结构-美度模型),实验验证了我们方法的鲁棒性和良好的亮度提升性能。 源代码可在此处获取:[请在原文链接中查看具体网址]
https://arxiv.org/abs/2501.16211
We present a physics-informed deep learning framework to address common limitations in Confocal Laser Scanning Microscopy (CLSM), such as diffraction limited resolution, noise, and undersampling due to low laser power conditions. The optical system's point spread function (PSF) and common CLSM image degradation mechanisms namely photon shot noise, dark current noise, motion blur, speckle noise, and undersampling were modeled and were directly included into model architecture. The model reconstructs high fidelity images from heavily noisy inputs by using convolutional and transposed convolutional layers. Following the advances in compressed sensing, our approach significantly reduces data acquisition requirements without compromising image resolution. The proposed method was extensively evaluated on simulated CLSM images of diverse structures, including lipid droplets, neuronal networks, and fibrillar systems. Comparisons with traditional deconvolution algorithms such as Richardson-Lucy (RL), non-negative least squares (NNLS), and other methods like Total Variation (TV) regularization, Wiener filtering, and Wavelet denoising demonstrate the superiority of the network in restoring fine structural details with high fidelity. Assessment metrics like Structural Similarity Index (SSIM) and Peak Signal to Noise Ratio (PSNR), underlines that the AdaptivePhysicsAutoencoder achieved robust image enhancement across diverse CLSM conditions, helping faster acquisition, reduced photodamage, and reliable performance in low light and sparse sampling scenarios holding promise for applications in live cell imaging, dynamic biological studies, and high throughput material characterization.
我们提出了一种基于物理信息的深度学习框架,旨在解决共聚焦激光扫描显微镜(CLSM)中常见的局限性问题,例如衍射限制分辨率、噪声以及由于低激光功率条件导致的采样不足。光学系统的点扩散函数(PSF)和CLSM图像退化机制——包括光子散粒噪声、暗电流噪声、运动模糊、颗粒噪声及采样不足等都被建模,并直接集成到模型架构中。该模型通过使用卷积层和转置卷积层,可以从高度噪点的输入数据重构出高保真度的图像。 借鉴压缩感知领域的进展,我们的方法显著减少了数据采集需求而不影响图像分辨率。所提出的方法在模拟CLSM图像上进行了广泛的评估,包括脂质滴、神经网络以及纤维系统等多样结构。与传统的反卷积算法如理查森-卢西(RL)、非负最小二乘法(NNLS)以及其他方法如总变差正则化、维纳滤波和小波去噪相比,证明了该网络在恢复精细结构细节方面具有更高的保真度。 评估指标如结构相似性指数(SSIM)和峰值信噪比(PSNR),突显出自适应物理自动编码器(AdaptivePhysicsAutoencoder)在各种CLSM条件下实现了稳健的图像增强。这有助于更快的数据采集、减少光损伤,并且在低光照及稀疏采样场景中提供可靠的性能,为活细胞成像、动态生物研究以及高通量材料表征等应用带来了希望。
https://arxiv.org/abs/2501.14709
In image enhancement tasks, such as low-light and underwater image enhancement, a degraded image can correspond to multiple plausible target images due to dynamic photography conditions, such as variations in illumination. This naturally results in a one-to-many mapping challenge. To address this, we propose a Bayesian Enhancement Model (BEM) that incorporates Bayesian Neural Networks (BNNs) to capture data uncertainty and produce diverse outputs. To achieve real-time inference, we introduce a two-stage approach: Stage I employs a BNN to model the one-to-many mappings in the low-dimensional space, while Stage II refines fine-grained image details using a Deterministic Neural Network (DNN). To accelerate BNN training and convergence, we introduce a dynamic \emph{Momentum Prior}. Extensive experiments on multiple low-light and underwater image enhancement benchmarks demonstrate the superiority of our method over deterministic models.
在图像增强任务中,例如低光和水下图像增强,由于动态拍摄条件(如光照变化)的影响,一幅退化后的图像可能对应多个合理的目标图像。这自然导致了一个从一到多的映射挑战。为了解决这个问题,我们提出了一种贝叶斯增强模型(Bayesian Enhancement Model, BEM),该模型采用贝叶斯神经网络(Bayesian Neural Networks, BNNs)来捕捉数据不确定性并生成多样化的输出结果。 为了实现实时推理,我们引入了一个两阶段的方法:第一阶段使用BNN在低维空间中建模一到多的映射;第二阶段则利用确定性神经网络(Deterministic Neural Network, DNN)对图像中的细粒度细节进行精炼。为加速BNN的训练和收敛过程,我们提出了一种动态的“动量先验”(Momentum Prior)方法。 在多个低光和水下图像增强基准测试中进行的广泛实验表明,我们的方法优于确定性模型。
https://arxiv.org/abs/2501.14265
Accurate prediction of pedestrian trajectories is crucial for enhancing the safety of autonomous vehicles and reducing traffic fatalities involving pedestrians. While numerous studies have focused on modeling interactions among pedestrians to forecast their movements, the influence of environmental factors and scene-object placements has been comparatively underexplored. In this paper, we present a novel trajectory prediction model that integrates both pedestrian interactions and environmental context to improve prediction accuracy. Our approach captures spatial and temporal interactions among pedestrians within a sparse graph framework. To account for pedestrian-scene interactions, we employ advanced image enhancement and semantic segmentation techniques to extract detailed scene features. These scene and interaction features are then fused through a cross-attention mechanism, enabling the model to prioritize relevant environmental factors that influence pedestrian movements. Finally, a temporal convolutional network processes the fused features to predict future pedestrian trajectories. Experimental results demonstrate that our method significantly outperforms existing state-of-the-art approaches, achieving ADE and FDE values of 0.252 and 0.372 meters, respectively, underscoring the importance of incorporating both social interactions and environmental context in pedestrian trajectory prediction.
准确预测行人的轨迹对于提高自动驾驶汽车的安全性和减少涉及行人的交通事故至关重要。虽然许多研究集中于建模行人之间的相互作用以预测他们的移动,但环境因素和场景物体布局的影响却相对较少被探索。在本文中,我们提出了一种新的轨迹预测模型,该模型结合了行人互动和环境背景,旨在提高预测准确性。我们的方法在一个稀疏图框架内捕捉行人间的时空交互。为了考虑行人与场景之间的相互作用,我们采用了先进的图像增强技术和语义分割技术来提取详细的场景特征。随后,通过交叉注意力机制融合这些场景和交互特征,使模型能够优先处理影响行人移动的相关环境因素。最后,一个时间卷积网络处理融合后的特征,以预测未来的行人轨迹。实验结果表明,我们的方法显著优于现有的最先进的方法,在ADE(Average Displacement Error)和FDE(Final Displacement Error)指标上分别达到了0.252米和0.372米,这强调了在行人的轨迹预测中结合社会互动和环境背景的重要性。
https://arxiv.org/abs/2501.13848
X-ray imaging is the most widely used medical imaging modality. However, in the common practice, inconsistency in the initial presentation of X-ray images is a common complaint by radiologists. Different patient positions, patient habitus and scanning protocols can lead to differences in image presentations, e.g., differences in brightness and contrast globally or regionally. To compensate for this, additional work will be executed by clinical experts to adjust the images to the desired presentation, which can be time-consuming. Existing deep-learning-based end-to-end solutions can automatically correct images with promising performances. Nevertheless, these methods are hard to be interpreted and difficult to be understood by clinical experts. In this manuscript, a novel interpretable mapping method by deep learning is proposed, which automatically enhances the image brightness and contrast globally and locally. Meanwhile, because the model is inspired by the workflow of the brightness and contrast manipulation, it can provide interpretable pixel maps for explaining the motivation of image enhancement. The experiment on the clinical datasets show the proposed method can provide consistent brightness and contrast correction on X-ray images with accuracy of 24.75 dB PSNR and 0.8431 SSIM.
X射线成像是最广泛使用的医学影像模式。然而,在常见实践中,放射科医生常常抱怨初始呈现的X射线图像存在不一致性问题。不同的患者体位、体型以及扫描协议会导致图像亮度和对比度上的差异,无论是全局还是局部都可能出现这样的变化。为了弥补这一点,临床专家需要额外的工作来调整图像以达到理想的展示效果,这可能非常耗时。现有的基于深度学习的端到端解决方案能够自动修正这些图像,并表现出令人鼓舞的效果。然而,这些方法难以解释和理解,对于临床专家来说较为复杂。 在本文献中,提出了一种新的可解释性映射方法,利用深度学习技术来自动增强X射线图像的整体及局部亮度和对比度。同时,由于该模型的设计灵感来源于亮度与对比度调整的工作流程,它可以提供可解释的像素图来说明图像增强的动力机制。 在临床数据集上的实验表明,所提出的方法可以在X射线图像上提供一致性的亮度和对比度校正,并且其性能指标达到了24.75 dB PSNR(峰值信噪比)及0.8431 SSIM(结构相似性指数)。
https://arxiv.org/abs/2501.12245
Low-light image enhancement (LLE) aims to improve the visual quality of images captured in poorly lit conditions, which often suffer from low brightness, low contrast, noise, and color distortions. These issues hinder the performance of computer vision tasks such as object detection, facial recognition, and autonomous this http URL enhancement techniques, such as multi-scale fusion and histogram equalization, fail to preserve fine details and often struggle with maintaining the natural appearance of enhanced images under complex lighting conditions. Although the Retinex theory provides a foundation for image decomposition, it often amplifies noise, leading to suboptimal image quality. In this paper, we propose the Dual Light Enhance Network (DLEN), a novel architecture that incorporates two distinct attention mechanisms, considering both spatial and frequency domains. Our model introduces a learnable wavelet transform module in the illumination estimation phase, preserving high- and low-frequency components to enhance edge and texture details. Additionally, we design a dual-branch structure that leverages the power of the Transformer architecture to enhance both the illumination and structural components of the this http URL extensive experiments, our model outperforms state-of-the-art methods on standard this http URL is available here: this https URL
低光图像增强(LLE)旨在改善在光线不足条件下拍摄的图像的视觉质量,这些条件下的图片通常会遇到亮度低、对比度差、噪声和颜色失真等问题。这些问题会对计算机视觉任务(如物体检测、面部识别和自动驾驶等)的表现产生不利影响。现有的多尺度融合和直方图均衡化等增强技术往往难以保留细节,并且在复杂的照明条件下很难保持图像自然的外观。虽然Retinex理论为图像分解提供了基础,但它常常会放大噪声,导致图像质量不理想。 在本文中,我们提出了一种新颖的架构——双光增强网络(DLEN),该架构结合了两种不同的注意力机制,在空间和频率领域考虑问题。我们的模型引入了一个可学习的小波变换模块,在光照估计阶段保留高、低频成分以强化边缘和纹理细节。此外,我们设计了一个双分支结构,利用Transformer架构来增强图像的光照和结构组成部分。 通过广泛的实验测试,我们的模型在标准数据集上的表现优于当前最先进的方法。论文全文可在以下链接获取:[此处应为链接,请访问原文获取具体链接]
https://arxiv.org/abs/2501.12235
Low-Light Image Enhancement (LLIE) is a key task in computational photography and imaging. The problem of enhancing images captured during night or in dark environments has been well-studied in the image signal processing literature. However, current deep learning-based solutions struggle with efficiency and robustness in real-world scenarios (e.g. scenes with noise, saturated pixels, bad illumination). We propose a lightweight neural network that combines image processing in the frequency and spatial domains. Our method, FLOL+, is one of the fastest models for this task, achieving state-of-the-art results on popular real scenes datasets such as LOL and LSRW. Moreover, we are able to process 1080p images under 12ms. Code and models at this https URL
低光图像增强(LLIE)是计算摄影和成像中的一个关键任务。夜间或在光线较暗的环境中拍摄的照片增强问题已经在图像信号处理文献中得到了充分的研究。然而,当前基于深度学习的方法在实际场景中的效率和鲁棒性方面仍然存在问题(例如噪声、饱和像素以及不良照明条件)。我们提出了一种结合频域与空域图像处理的轻量级神经网络。我们的方法FLOL+是执行此任务最快的模型之一,在流行的现实场景数据集(如LOL和LSRW)上实现了最先进的性能。此外,我们可以以不到12毫秒的时间处理1080p分辨率的图片。代码与模型可在该网址获得:[请在此处插入实际URL]。
https://arxiv.org/abs/2501.09718
With the development of deep learning, numerous methods for low-light image enhancement (LLIE) have demonstrated remarkable performance. Mainstream LLIE methods typically learn an end-to-end mapping based on pairs of low-light and normal-light images. However, normal-light images under varying illumination conditions serve as reference images, making it difficult to define a ``perfect'' reference image This leads to the challenge of reconciling metric-oriented and visual-friendly results. Recently, many cross-modal studies have found that side information from other related modalities can guide visual representation learning. Based on this, we introduce a Natural Language Supervision (NLS) strategy, which learns feature maps from text corresponding to images, offering a general and flexible interface for describing an image under different illumination. However, image distributions conditioned on textual descriptions are highly multimodal, which makes training difficult. To address this issue, we design a Textual Guidance Conditioning Mechanism (TCM) that incorporates the connections between image regions and sentence words, enhancing the ability to capture fine-grained cross-modal cues for images and text. This strategy not only utilizes a wider range of supervised sources, but also provides a new paradigm for LLIE based on visual and textual feature alignment. In order to effectively identify and merge features from various levels of image and textual information, we design an Information Fusion Attention (IFA) module to enhance different regions at different levels. We integrate the proposed TCM and IFA into a Natural Language Supervision network for LLIE, named NaLSuper. Finally, extensive experiments demonstrate the robustness and superior effectiveness of our proposed NaLSuper.
随着深度学习的发展,许多低光图像增强(LLIE)方法展现了卓越的性能。主流的LLIE方法通常基于低光照和正常光照条件下的成对图像来学习端到端映射。然而,由于不同照明条件下正常的光照图像用作参考图像,定义一个“完美”的参考图像变得困难。这导致了指标导向与视觉友好的结果之间的矛盾挑战。最近,许多跨模态研究表明,来自其他相关模态的侧面信息可以引导视觉表示学习。基于此,我们引入了一种自然语言监督(NLS)策略,该策略从对应于图像的文本中学习特征图,为不同光照条件下描述图像提供了一个通用且灵活的接口。然而,根据文本描述条件下的图像分布是多模态的,这使得训练变得困难。为此,我们设计了文本引导调节机制(TCM),它结合了图像区域与句子单词之间的联系,增强了捕获图像和文本之间细粒度跨模式线索的能力。该策略不仅利用了更广泛的监督来源,还为基于视觉和文本特征对齐的LLIE提供了一种新的范例。为了有效地识别并融合来自不同层级的图像和文本信息的特征,我们设计了一个信息融合注意(IFA)模块来增强不同层次的不同区域。我们将所提出的TCM和IFA整合到一个用于LLIE的自然语言监督网络中,并将其命名为NaLSuper。最后,大量的实验表明我们的提议方法NaLSuper具有强大的鲁棒性和优越的有效性。
https://arxiv.org/abs/2501.06546
In recent years, there has been a surge of research focused on underwater image enhancement using Generative Adversarial Networks (GANs), driven by the need to overcome the challenges posed by underwater environments. Issues such as light attenuation, scattering, and color distortion severely degrade the quality of underwater images, limiting their use in critical applications. Generative Adversarial Networks (GANs) have emerged as a powerful tool for enhancing underwater photos due to their ability to learn complex transformations and generate realistic outputs. These advancements have been applied to real-world applications, including marine biology and ecosystem monitoring, coral reef health assessment, underwater archaeology, and autonomous underwater vehicle (AUV) navigation. This paper explores all major approaches to underwater image enhancement, from physical and physics-free models to Convolutional Neural Network (CNN)-based models and state-of-the-art GAN-based methods. It provides a comprehensive analysis of these methods, evaluation metrics, datasets, and loss functions, offering a holistic view of the field. Furthermore, the paper delves into the limitations and challenges faced by current methods, such as generalization issues, high computational demands, and dataset biases, while suggesting potential directions for future research.
近年来,由于需要克服水下环境带来的挑战,利用生成对抗网络(GANs)进行水下图像增强的研究大幅增加。光线衰减、散射和颜色失真等问题严重降低了水下图像的质量,限制了其在关键应用中的使用。由于能够学习复杂的变换并产生逼真的输出,生成对抗网络(GANs)已成为提升水下照片质量的强大工具。这些进展已被应用于实际应用场景中,包括海洋生物学和生态系统监测、珊瑚礁健康评估、水下考古学以及自主水下航行器(AUV)导航等领域。 本文全面探讨了水下图像增强的所有主要方法,从物理模型到无物理假设的模型,再到基于卷积神经网络(CNN)的方法以及最新的基于GAN的方法。文章对这些方法进行了深入分析,并讨论了评估指标、数据集和损失函数等关键因素,为该领域提供了全面的观点。 此外,本文还探讨了现有方法面临的限制与挑战,例如泛化问题、高计算需求以及数据集偏差等问题,并提出未来研究的潜在方向。
https://arxiv.org/abs/2501.06273
Recent advancements in image translation for enhancing mixed-exposure images have demonstrated the transformative potential of deep learning algorithms. However, addressing extreme exposure variations in images remains a significant challenge due to the inherent complexity and contrast inconsistencies across regions. Current methods often struggle to adapt effectively to these variations, resulting in suboptimal performance. In this work, we propose HipyrNet, a novel approach that integrates a HyperNetwork within a Laplacian Pyramid-based framework to tackle the challenges of mixed-exposure image enhancement. The inclusion of a HyperNetwork allows the model to adapt to these exposure variations. HyperNetworks dynamically generates weights for another network, allowing dynamic changes during deployment. In our model, the HyperNetwork employed is used to predict optimal kernels for Feature Pyramid decomposition, which enables a tailored and adaptive decomposition process for each input image. Our enhanced translational network incorporates multiscale decomposition and reconstruction, leveraging dynamic kernel prediction to capture and manipulate features across varying scales. Extensive experiments demonstrate that HipyrNet outperforms existing methods, particularly in scenarios with extreme exposure variations, achieving superior results in both qualitative and quantitative evaluations. Our approach sets a new benchmark for mixed-exposure image enhancement, paving the way for future research in adaptive image translation.
最近在图像翻译领域,特别是在增强混合曝光图片方面的进展已经展示了深度学习算法的变革潜力。然而,处理极端曝光差异仍然是一个重大挑战,因为不同区域之间的固有复杂性和对比度不一致使这一问题更加棘手。目前的方法往往难以有效适应这些变化,导致性能不佳。 为此,我们提出了HipyrNet,这是一种创新方法,它在拉普拉斯金字塔(Laplacian Pyramid)框架内集成了超网络(HyperNetwork),用于解决混合曝光图像增强的挑战。通过引入超网络,模型能够更好地适应不同的曝光差异。超网络可以动态生成另一个网络的权重,在部署时实现动态调整。 在我们的模型中,所使用的超网络被用来预测特征金字塔分解的最佳核函数,这使每个输入图片都能获得量身定制且具有适应性的分解过程。我们改进的翻译网络结合了多尺度分解与重建技术,并利用动态核函数预测来捕捉和操控不同尺度上的特性。 广泛的实验表明,在处理极端曝光差异的情况下,HipyrNet的表现优于现有的方法,并在定性和定量评估中均取得了卓越的结果。我们的方法为混合曝光图像增强设定了新的基准,并为进一步研究适应性图像翻译铺平了道路。
https://arxiv.org/abs/2501.05195
In this paper, we propose a novel low-light image enhancement method aimed at improving the performance of recognition models. Despite recent advances in deep learning, the recognition of images under low-light conditions remains a challenge. Although existing low-light image enhancement methods have been developed to improve image visibility for human vision, they do not specifically focus on enhancing recognition model performance. Our proposed low-light image enhancement method consists of two key modules: the Global Enhance Module, which adjusts the overall brightness and color balance of the input image, and the Pixelwise Adjustment Module, which refines image features at the pixel level. These modules are trained to enhance input images to improve downstream recognition model performance effectively. Notably, the proposed method can be applied as a frontend filter to improve low-light recognition performance without requiring retraining of downstream recognition models. Experimental results demonstrate that our method improves the performance of pretrained recognition models under low-light conditions and its effectiveness.
在这篇论文中,我们提出了一种新颖的低光图像增强方法,旨在提高识别模型的性能。尽管深度学习领域取得了诸多进展,但在低光照条件下进行图像识别仍然是一项挑战。虽然现有的低光图像增强方法已开发出来以改善人类视觉下的图像可见性,但它们并未专门针对提升识别模型的性能而设计。 我们提出的低光图像增强方法包含两个关键模块:全局增强模块(Global Enhance Module),用于调整输入图像的整体亮度和色彩平衡;像素级调整模块(Pixelwise Adjustment Module),用于在像素级别细化图像特征。这些模块经过训练,能够增强输入图像以有效提升下游识别模型的性能。 值得注意的是,所提出的方法可以作为前端过滤器使用,在不重新训练下游识别模型的情况下改善低光条件下的识别性能。实验结果表明,我们的方法能够提高预训练识别模型在低光照条件下的表现,并展示了其有效性。
https://arxiv.org/abs/2501.04210
Consistency models have emerged as a promising alternative to diffusion models, offering high-quality generative capabilities through single-step sample generation. However, their application to multi-domain image translation tasks, such as cross-modal translation and low-light image enhancement remains largely unexplored. In this paper, we introduce Conditional Consistency Models (CCMs) for multi-domain image translation by incorporating additional conditional inputs. We implement these modifications by introducing task-specific conditional inputs that guide the denoising process, ensuring that the generated outputs retain structural and contextual information from the corresponding input domain. We evaluate CCMs on 10 different datasets demonstrating their effectiveness in producing high-quality translated images across multiple domains. Code is available at this https URL.
一致性模型作为一种有前景的扩散模型替代方案已出现,通过一步式样本生成提供高质量的生成能力。然而,在跨模态翻译和低光照图像增强等多域图像转换任务中的应用仍然鲜为人知。在本文中,我们引入了条件一致性模型(CCMs),用于多域图像转换,并通过纳入额外的条件输入来实现这一点。我们通过引入特定于任务的条件输入来实施这些修改,以指导去噪过程,确保生成的输出保留来自相应输入领域的结构和上下文信息。我们在10个不同的数据集上对CCMs进行了评估,证明了它们在多个领域内生成高质量转换图像的有效性。代码可在[此处](https://this https URL)获取。请注意,上述链接需要替换为实际可访问的URL地址。
https://arxiv.org/abs/2501.01223
Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.
感谢最近在任务驱动型图像质量增强(IQE)模型如ESTR方面取得的成就,图像增强模型和视觉识别模型可以在生成高质量且可被人类视觉系统感知的处理图片的同时相互提高各自的性能。然而,现有的任务驱动型IQE模型往往忽视了一个事实——不同级别的视觉任务对图像特征的需求各不相同,有时甚至是冲突的。为了解决这个问题,本文提出了一种针对医学图像的任务驱动型IQE的泛化梯度推广(GradProm)训练策略。 具体来说,我们将一个任务驱动型IQE系统划分为两个子模型:一个是用于图像增强的主要流模型,另一个是用于视觉识别的辅助模型。在训练过程中,GradProm仅使用视觉识别模型和图像增强模型的梯度来更新图像增强模型的参数,并且只有当这两个子模型的梯度方向一致时才会进行更新(一致性通过余弦相似性测量)。如果两个子模型的梯度方向不一致,则GradProm只利用图像增强模型的梯度来更新其参数。理论上,我们已经证明,在实现GradProm的情况下,图像增强模型的优化方向不会被辅助视觉识别模型偏移。 在实验方面,我们在四个公开且具有挑战性的医学图像数据集上进行了广泛的实验,并展示了GradProm相比现有最先进的方法具有优越的表现能力。
https://arxiv.org/abs/2501.01114
Although significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we introduce a novel \textbf{LLIE} framework with the guidance of \textbf{G}enerative \textbf{P}erceptual \textbf{P}riors (\textbf{GPP-LLIE}) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (\textit{\textbf{GPP-LN}}) and an attention mechanism (\textit{\textbf{LPP-Attn}}) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data. The code is released at \url{this https URL}.
尽管在增强低光(LL)图像的可见性、纹理细节恢复和降噪方面已经取得了显著进展,但当前的低光图像增强(LLIE)方法难以应用于现实世界的场景中,主要是因为遇到的照明条件千变万化。此外,生成既视觉上真实又吸引人的图像改进仍是未充分探索的研究领域。为应对这些挑战,我们提出了一种新的低光图像增强框架,并引入了由视觉-语言模型(VLMs)指导的生成感知先验(GPP-LLIE)。具体而言,首先我们提出了一个管道,引导VLM评估低光图像的多个视觉属性,并将评估量化为全局和局部感知先验输出。随后,为了在扩散过程中利用这些生成感知先验来提升LLIE的效果,我们引入了一个基于变压器的基础架构,并开发了一种新的归一化层(GPP-LN)以及一种由全局和局部感知先验指导的注意力机制(LPP-Attn)。广泛的实验表明,我们的模型在配对低光数据集上超越了当前最先进的方法,并且在现实世界的数据中表现出更强的泛化能力。代码可在[此处](此链接指向原始文本中的URL)获取。
https://arxiv.org/abs/2412.20916
Vehicle detection and tracking in satellite video is essential in remote sensing (RS) applications. However, upon the statistical analysis of existing datasets, we find that the dim vehicles with low radiation intensity and limited contrast against the background are rarely annotated, which leads to the poor effect of existing approaches in detecting moving vehicles under low radiation conditions. In this paper, we address the challenge by building a \textbf{S}mall and \textbf{D}im \textbf{M}oving Cars (SDM-Car) dataset with a multitude of annotations for dim vehicles in satellite videos, which is collected by the Luojia 3-01 satellite and comprises 99 high-quality videos. Furthermore, we propose a method based on image enhancement and attention mechanisms to improve the detection accuracy of dim vehicles, serving as a benchmark for evaluating the dataset. Finally, we assess the performance of several representative methods on SDM-Car and present insightful findings. The dataset is openly available at this https URL.
卫星视频中的车辆检测与跟踪在遥感(RS)应用中至关重要。然而,通过对现有数据集的统计分析,我们发现辐射强度低且与背景对比度有限的暗淡车辆很少被标注,这导致现有的方法在低辐射条件下检测移动车辆的效果不佳。本文通过构建一个名为“小而暗移动汽车”(SDM-Car)的数据集来应对这一挑战,该数据集包含大量卫星视频中暗淡车辆的注释,并由珞珈3-01号卫星收集,包含了99个高质量视频。此外,我们提出了一种基于图像增强和注意力机制的方法,以提高对暗淡车辆的检测精度,并作为评估数据集的标准。最后,我们在SDM-Car上评估了几种代表性方法的表现并提出了有洞察力的发现。该数据集可公开访问,网址为:[此 https URL]。 请注意,原文中的URL需要替换为实际可以访问的地址。
https://arxiv.org/abs/2412.18214