Surveillance systems play a critical role in security and reconnaissance, but their performance is often compromised by low-quality images and videos, leading to reduced accuracy in face recognition. Additionally, existing AI-based facial analysis models suffer from biases related to skin tone variations and partially occluded faces, further limiting their effectiveness in diverse real-world scenarios. These challenges are the results of data limitations and imbalances, where available training datasets lack sufficient diversity, resulting in unfair and unreliable facial recognition performance. To address these issues, we propose a data-driven platform that enhances surveillance capabilities by generating synthetic training data tailored to compensate for dataset biases. Our approach leverages deep learning-based facial attribute manipulation and reconstruction using autoencoders and Generative Adversarial Networks (GANs) to create diverse and high-quality facial datasets. Additionally, our system integrates an image enhancement module, improving the clarity of low-resolution or occluded faces in surveillance footage. We evaluate our approach using the CelebA dataset, demonstrating that the proposed platform enhances both training data diversity and model fairness. This work contributes to reducing bias in AI-based facial analysis and improving surveillance accuracy in challenging environments, leading to fairer and more reliable security applications.
监控系统在安全和侦察中扮演着关键角色,但其性能常常因图像和视频质量低劣而受损,导致面部识别的准确性降低。此外,现有的基于人工智能的面部分析模型受肤色差异及部分遮挡脸部的影响而产生偏差,进一步限制了它们在多样化现实场景中的有效性。这些挑战源于数据限制和不平衡问题,现有训练数据集缺乏足够的多样性,从而导致不公平且不可靠的面部识别性能。 为了解决这些问题,我们提出了一种基于数据驱动平台的方法,通过生成合成训练数据来增强监控能力,并弥补数据集偏差。我们的方法利用深度学习技术进行面部属性操作与重建,使用自动编码器和生成对抗网络(GAN)创建多样化的高质量面部图像数据集。此外,该系统还整合了一个图像增强模块,以提高低分辨率或部分遮挡的面部在监控录像中的清晰度。 我们通过CelebA数据集验证了这种方法的有效性,结果表明所提出的平台可以提升训练数据多样性并改善模型公平性。这项工作有助于减少基于人工智能的面部分析中的偏差,并在充满挑战的环境中提升监控系统的准确性,从而为更公正和可靠的安防应用提供支持。
https://arxiv.org/abs/2506.06578
Under extreme low-light conditions, traditional frame-based cameras, due to their limited dynamic range and temporal resolution, face detail loss and motion blur in captured images. To overcome this bottleneck, researchers have introduced event cameras and proposed event-guided low-light image enhancement algorithms. However, these methods neglect the influence of global low-frequency noise caused by dynamic lighting conditions and local structural discontinuities in sparse event data. To address these issues, we propose an innovative Bidirectional guided Low-light Image Enhancement framework (BiLIE). Specifically, to mitigate the significant low-frequency noise introduced by global illumination step changes, we introduce the frequency high-pass filtering-based Event Feature Enhancement (EFE) module at the event representation level to suppress the interference of low-frequency information, and preserve and highlight the high-frequency this http URL, we design a Bidirectional Cross Attention Fusion (BCAF) mechanism to acquire high-frequency structures and edges while suppressing structural discontinuities and local noise introduced by sparse event guidance, thereby generating smoother fused this http URL, considering the poor visual quality and color bias in existing datasets, we provide a new dataset (RELIE), with high-quality ground truth through a reliable enhancement scheme. Extensive experimental results demonstrate that our proposed BiLIE outperforms state-of-the-art methods by 0.96dB in PSNR and 0.03 in LPIPS.
在极端低光条件下,传统的帧基摄像头由于其动态范围和时间分辨率有限,在捕捉图像时会面临细节丢失和运动模糊的问题。为克服这一瓶颈,研究人员引入了事件相机,并提出了基于事件的低光照图像增强算法。然而,这些方法忽略了由动态照明条件引起的全局低频噪声以及稀疏事件数据中局部结构不连续性的影响。为了应对这些问题,我们提出了一种创新的双向引导低光图像增强框架(BiLIE)。 具体而言,为减少由于全球照明变化步长引入的重大低频噪声,我们在事件表示层面引入了基于频率高通滤波的事件特征增强(EFE)模块来抑制低频信息的干扰,并保留和突出高频细节。同时,我们设计了一种双向交叉注意融合机制(BCAF),以获取高频结构和边缘,同时抑制稀疏事件引导所引入的局部噪声及结构不连续性,从而生成更加平滑的融合图像。 此外,考虑到现有数据集中存在的视觉质量差以及色彩偏差的问题,我们提供了一个新的数据集(RELIE),通过可靠增强方案来确保高质量的真实标签。广泛的实验结果表明,我们的BiLIE框架在PSNR指标上比当前最佳方法高出0.96dB,在LPIPS指标上高出0.03。
https://arxiv.org/abs/2506.06120
Removing reflections is a crucial task in computer vision, with significant applications in photography and image enhancement. Nevertheless, existing methods are constrained by the absence of large-scale, high-quality, and diverse datasets. In this paper, we present a novel benchmark for Single Image Reflection Removal (SIRR). We have developed a large-scale dataset containing 5,300 high-quality, pixel-aligned image pairs, each consisting of a reflection image and its corresponding clean version. Specifically, the dataset is divided into two parts: 5,000 images are used for training, and 300 images are used for validation. Additionally, we have included 100 real-world testing images without ground truth (GT) to further evaluate the practical performance of reflection removal methods. All image pairs are precisely aligned at the pixel level to guarantee accurate supervision. The dataset encompasses a broad spectrum of real-world scenarios, featuring various lighting conditions, object types, and reflection patterns, and is segmented into training, validation, and test sets to facilitate thorough evaluation. To validate the usefulness of our dataset, we train a U-Net-based model and evaluate it using five widely-used metrics, including PSNR, SSIM, LPIPS, DISTS, and NIQE. We will release both the dataset and the code on this https URL to facilitate future research in this field.
移除图像中的反射是计算机视觉领域的一项重要任务,在摄影和图像增强方面有着广泛的应用。然而,现有的方法受到大规模、高质量且多样化的数据集缺乏的限制。在本文中,我们提出了一套用于单张图片去反射(Single Image Reflection Removal, SIRR)的新基准测试。为此,我们开发了一个包含5,300对高分辨率像素级对齐图像的数据集,每一对包括带有反射的原始图和对应的无反射干净图。 具体来说,该数据集被分为两部分:其中5,000张用于训练,另外300张用于验证。此外,我们还包含了一组100张实际场景下的测试图片(没有真实标签),以便进一步评估去反射方法的实用性能。所有的图像对都经过像素级精确对齐以保证监督信号的准确性。 该数据集涵盖了各种现实世界中的场景,包括多样的光照条件、物体类型和反射模式,并且按照训练、验证和测试三个部分进行划分,以便全面评估。 为了展示我们数据集的有效性,我们在U-Net基础上训练了一个模型,并使用五个常用的评价指标(PSNR、SSIM、LPIPS、DISTS、NIQE)进行了评测。我们将在这个网址发布该数据集及代码,以促进未来在该领域的研究工作。
https://arxiv.org/abs/2506.05482
Image degradation is a prevalent issue in various real-world applications, affecting visual quality and downstream processing tasks. In this study, we propose a novel framework that employs a Vision-Language Model (VLM) to automatically classify degraded images into predefined categories. The VLM categorizes an input image into one of four degradation types: (A) super-resolution degradation (including noise, blur, and JPEG compression), (B) reflection artifacts, (C) motion blur, or (D) no visible degradation (high-quality image). Once classified, images assigned to categories A, B, or C undergo targeted restoration using dedicated models tailored for each specific degradation type. The final output is a restored image with improved visual quality. Experimental results demonstrate the effectiveness of our approach in accurately classifying image degradations and enhancing image quality through specialized restoration models. Our method presents a scalable and automated solution for real-world image enhancement tasks, leveraging the capabilities of VLMs in conjunction with state-of-the-art restoration techniques.
图像退化是各种实际应用中的一个常见问题,它会影响视觉质量并影响后续处理任务。在这项研究中,我们提出了一种新颖的框架,该框架采用视觉-语言模型(VLM)自动将受损图像分类为预定义类别。VLM将输入图像归类为四种降质类型之一:(A) 超分辨率退化(包括噪声、模糊和JPEG压缩),(B) 反射伪影,(C) 运动模糊,或 (D) 无明显退化(高质量图像)。一旦分类完成,被分配到类别 A、B 或 C 的图像将使用针对每种特定降质类型量身定制的模型进行目标修复。最终输出是视觉质量得到提升的恢复后的图像。 实验结果证明了我们的方法在准确分类图像退化和通过专门的修复技术提高图像质量方面的有效性。本方法为实际图像增强任务提供了一种可扩展且自动化的解决方案,利用了VLM与最先进的修复技术相结合的能力。
https://arxiv.org/abs/2506.05450
We present a novel dual-stream architecture that achieves state-of-the-art underwater image enhancement by explicitly integrating the Jaffe-McGlamery physical model with capsule clustering-based feature representation learning. Our method simultaneously estimates transmission maps and spatially-varying background light through a dedicated physics estimator while extracting entity-level features via capsule clustering in a parallel stream. This physics-guided approach enables parameter-free enhancement that respects underwater formation constraints while preserving semantic structures and fine-grained details. Our approach also features a novel optimization objective ensuring both physical adherence and perceptual quality across multiple spatial frequencies. To validate our approach, we conducted extensive experiments across six challenging benchmarks. Results demonstrate consistent improvements of $+0.5$dB PSNR over the best existing methods while requiring only one-third of their computational complexity (FLOPs), or alternatively, more than $+1$dB PSNR improvement when compared to methods with similar computational budgets. Code and data \textit{will} be available at this https URL.
我们提出了一种新颖的双流架构,通过明确整合Jaffe-McGlamery物理模型与基于胶囊聚类的特征表示学习,实现了最先进的水下图像增强。我们的方法在专用物理估计器中同时估算传输图和空间变化的背景光,而在并行流中则通过胶囊聚类提取实体级特征。这种物理引导的方法能够在尊重水下成像约束的同时保持语义结构和细微细节,并且无需参数调整即可实现增强效果。此外,我们的方法还引入了一种新颖的优化目标,确保在整个多频空间频率上同时符合物理规律并保证感知质量。 为了验证我们提出的方法的有效性,我们在六个具有挑战性的基准数据集上进行了广泛的实验。结果表明,在仅使用现有最佳方法三分之一计算复杂度(FLOPs)的情况下,我们的方法在PSNR指标上的性能提升了0.5dB;而与具有类似计算预算的方法相比,则表现出超过1dB PSNR的改进。代码和数据将在此[URL]发布。 注:请用实际链接替换上述方括号中的“https URL”文本以提供准确的数据获取途径。
https://arxiv.org/abs/2506.04753
Low-light image denoising and enhancement are challenging, especially when traditional noise assumptions, such as Gaussian noise, do not hold in majority. In many real-world scenarios, such as low-light imaging, noise is signal-dependent and is better represented as Poisson noise. In this work, we address the problem of denoising images degraded by Poisson noise under extreme low-light conditions. We introduce a light-weight deep learning-based method that integrates Retinex based decomposition with Poisson denoising into a unified encoder-decoder network. The model simultaneously enhances illumination and suppresses noise by incorporating a Poisson denoising loss to address signal-dependent noise. Without prior requirement for reflectance and illumination, the network learns an effective decomposition process while ensuring consistent reflectance and smooth illumination without causing any form of color distortion. The experimental results demonstrate the effectiveness and practicality of the proposed low-light illumination enhancement method. Our method significantly improves visibility and brightness in low-light conditions, while preserving image structure and color constancy under ambient illumination.
在低光照条件下对图像进行去噪和增强是一项挑战,特别是在传统噪声假设(如高斯噪声)不成立的情况下。实际上,在许多场景中,例如低光成像时,噪声是与信号相关的,并且更适合作为泊松噪声来表示。在这项工作中,我们针对极端低光环境下受到泊松噪声污染的图像去噪问题提出了解决方案。我们引入了一种基于轻量级深度学习的方法,该方法将Retinex分解技术与泊松去噪相结合,在一个统一的编码器-解码器网络中实现。 该模型通过加入处理信号相关噪声的泊松去噪损失来同时增强照明并抑制噪声。无需事先了解反射率和光照的情况下,网络可以学习有效且一致的分解过程,确保反射率的一致性和光照平滑性而不引起任何形式的颜色失真。实验结果证明了所提出低光照明增强方法的有效性和实用性。我们的方法显著提高了低光环境下的可见度和亮度,同时在环境光照条件下保持图像结构和颜色一致性。
https://arxiv.org/abs/2506.04470
In nighttime conditions, high noise levels and bright illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for feature extraction and a cross-attention mechanism for fusion to effectively integrate information from both modalities. To support research in this domain, we introduce the Visible-Thermal Image Enhancement Evaluation (V-TIEE) dataset, comprising 50 co-located visible and thermal images captured under diverse nighttime conditions. Extensive evaluations on the publicly available LLVIP dataset and our V-TIEE dataset demonstrate that RT-X Net outperforms state-of-the-art methods in low-light image enhancement. The code and the V-TIEE can be found here this https URL.
在夜间条件下,高噪音水平和明亮的照明源会降低图像质量,使得低光环境下的图像增强变得极具挑战性。热成像图提供互补信息,可以提供更多纹理和结构细节。我们提出了RT-X Net,这是一种交叉注意力网络,用于融合RGB和热图以进行夜间图像增强。我们利用自注意网络提取特征,并采用跨模态注意力机制将两种模式的信息有效集成起来。为了支持该领域的研究,我们引入了可见光-热成像图像增强评估(V-TIEE)数据集,包含在各种夜间条件下拍摄的50组位置对应的可见光和热图。通过公开可用的LLVIP数据集以及我们的V-TIEE数据集进行广泛测试后发现,RT-X Net 在低光照图像增强方面优于现有技术方法。该代码及V-TIEE数据集可在此网址获得:[此链接](请将"[此链接]"替换为实际链接)。
https://arxiv.org/abs/2505.24705
Low-Light Image Enhancement (LLIE) is crucial for improving both human perception and computer vision tasks. This paper addresses two challenges in zero-reference LLIE: obtaining perceptually 'good' images using the Contrastive Language-Image Pre-Training (CLIP) model and maintaining computational efficiency for high-resolution images. We propose CLIP-Utilized Reinforcement learning-based Visual image Enhancement (CURVE). CURVE employs a simple image processing module which adjusts global image tone based on Bézier curve and estimates its processing parameters iteratively. The estimator is trained by reinforcement learning with rewards designed using CLIP text embeddings. Experiments on low-light and multi-exposure datasets demonstrate the performance of CURVE in terms of enhancement quality and processing speed compared to conventional methods.
低光图像增强(LLIE)对于改善人类感知和计算机视觉任务至关重要。本文解决了零参考LLIE的两个挑战:使用对比语言-图像预训练(CLIP)模型获取感知上“良好”的图像,以及为高分辨率图像保持计算效率。我们提出了基于强化学习并利用CLIP的视觉图像增强方法(CURVE)。CURVE采用了一个简单的图像处理模块,该模块根据贝塞尔曲线调整全局图像色调,并通过迭代估计其处理参数。估计算法通过使用CLIP文本嵌入设计奖励的方式进行强化学习训练。 在低光和多曝光数据集上的实验表明,与传统方法相比,CURVE在增强质量和处理速度方面表现出色。
https://arxiv.org/abs/2505.23102
Existing low-light image enhancement (LLIE) and joint LLIE and deblurring (LLIE-deblur) models have made strides in addressing predefined degradations, yet they are often constrained by dynamically coupled degradations. To address these challenges, we introduce a Unified Receptance Weighted Key Value (URWKV) model with multi-state perspective, enabling flexible and effective degradation restoration for low-light images. Specifically, we customize the core URWKV block to perceive and analyze complex degradations by leveraging multiple intra- and inter-stage states. First, inspired by the pupil mechanism in the human visual system, we propose Luminance-adaptive Normalization (LAN) that adjusts normalization parameters based on rich inter-stage states, allowing for adaptive, scene-aware luminance modulation. Second, we aggregate multiple intra-stage states through exponential moving average approach, effectively capturing subtle variations while mitigating information loss inherent in the single-state mechanism. To reduce the degradation effects commonly associated with conventional skip connections, we propose the State-aware Selective Fusion (SSF) module, which dynamically aligns and integrates multi-state features across encoder stages, selectively fusing contextual information. In comparison to state-of-the-art models, our URWKV model achieves superior performance on various benchmarks, while requiring significantly fewer parameters and computational resources.
现有的低光图像增强(LLIE)和联合低光图像增强与去模糊(LLIE-deblur)模型在解决预定义退化方面取得了显著进展,但它们通常受到动态耦合退化的限制。为了解决这些挑战,我们引入了一个具有多态视角的统一受体加权键值(URWKV)模型,该模型能够灵活有效地恢复低光图像中的降质情况。具体而言,我们将核心URWKV块定制化以感知和分析复杂的降质情况,并利用多个内部和跨阶段状态进行。 首先,受到人类视觉系统中瞳孔机制的启发,我们提出了一种基于丰富跨阶段状态调整归一化参数的亮度自适应归一化(LAN),它允许根据场景对亮度进行自适应调节。其次,通过指数移动平均方法聚合多个内部分阶段的状态,有效地捕捉细微的变化,并减少单一状态机制中固有的信息丢失。 为了降低传统跳过连接通常伴随的退化效应,我们提出了一个具有多态感知的选择性融合(SSF)模块,该模块能够在编码器阶段动态对齐和集成跨多级特征,选择性地融合上下文信息。与最先进的模型相比,我们的URWKV模型在各种基准测试中表现出优越性能的同时,还显著减少了所需的参数量和计算资源。 总结而言,通过引入具有创新机制的URWKV模型,我们旨在提供一种更灵活、更高效的方法来解决低光图像处理中的复杂退化问题。
https://arxiv.org/abs/2505.23068
Document Image Enhancement (DIE) serves as a critical component in Document AI systems, where its performance substantially determines the effectiveness of downstream tasks. To address the limitations of existing methods confined to single-degradation restoration or grayscale image processing, we present Global with Local Parametric Generation Enhancement Network (GL-PGENet), a novel architecture designed for multi-degraded color document images, ensuring both efficiency and robustness in real-world scenarios. Our solution incorporates three key innovations: First, a hierarchical enhancement framework that integrates global appearance correction with local refinement, enabling coarse-to-fine quality improvement. Second, a Dual-Branch Local-Refine Network with parametric generation mechanisms that replaces conventional direct prediction, producing enhanced outputs through learned intermediate parametric representations rather than pixel-wise mapping. This approach enhances local consistency while improving model generalization. Finally, a modified NestUNet architecture incorporating dense block to effectively fuse low-level pixel features and high-level semantic features, specifically adapted for document image characteristics. In addition, to enhance generalization performance, we adopt a two-stage training strategy: large-scale pretraining on a synthetic dataset of 500,000+ samples followed by task-specific fine-tuning. Extensive experiments demonstrate the superiority of GL-PGENet, achieving state-of-the-art SSIM scores of 0.7721 on DocUNet and 0.9480 on RealDAE. The model also exhibits remarkable cross-domain adaptability and maintains computational efficiency for high-resolution images without performance degradation, confirming its practical utility in real-world scenarios.
文档图像增强(DIE)在文档人工智能系统中扮演着关键角色,其性能对下游任务的有效性有着重大影响。为了克服现有方法仅限于单一退化恢复或灰度图像处理的局限性,我们提出了一种全新的架构——全局与局部参数生成增强网络(GL-PGENet),专门针对多退化的彩色文档图像,并确保在实际应用中的高效性和鲁棒性。 我们的解决方案包括三项关键创新: 1. **分层增强框架**:该框架结合了全球外观校正和局部细化,通过从粗到细的质量改进方法来提升图像质量。 2. **双分支局部精细化网络与参数生成机制**:替代传统直接预测的方法,这种机制采用学习得到的中间参数表示而不是像素级映射来产生增强输出。这种方法提升了局部一致性,并增强了模型的泛化能力。 3. **修改版NestUNet架构**:此架构融合了密集块,有效结合低层次像素特征和高层次语义特征,并特别针对文档图像的特点进行了优化。 此外,为了提高泛化性能,我们采用了一种两阶段训练策略:先在包含50万多个样本的合成数据集上进行大规模预训练,随后根据具体任务进行微调。广泛的实验表明GL-PGENet显著优于现有技术,在DocUNet和RealDAE测试集中分别达到了最先进的SSIM得分0.7721和0.9480。 该模型还展示了出色的跨域适应性,并且在处理高分辨率图像时,仍能保持计算效率而不影响性能表现,这进一步证实了其在实际场景中的实用价值。
https://arxiv.org/abs/2505.22021
Occupancy prediction aims to estimate the 3D spatial distribution of occupied regions along with their corresponding semantic labels. Existing vision-based methods perform well on daytime benchmarks but struggle in nighttime scenarios due to limited visibility and challenging lighting conditions. To address these challenges, we propose \textbf{LIAR}, a novel framework that learns illumination-affined representations. LIAR first introduces Selective Low-light Image Enhancement (SLLIE), which leverages the illumination priors from daytime scenes to adaptively determine whether a nighttime image is genuinely dark or sufficiently well-lit, enabling more targeted global enhancement. Building on the illumination maps generated by SLLIE, LIAR further incorporates two illumination-aware components: 2D Illumination-guided Sampling (2D-IGS) and 3D Illumination-driven Projection (3D-IDP), to respectively tackle local underexposure and overexposure. Specifically, 2D-IGS modulates feature sampling positions according to illumination maps, assigning larger offsets to darker regions and smaller ones to brighter regions, thereby alleviating feature degradation in underexposed areas. Subsequently, 3D-IDP enhances semantic understanding in overexposed regions by constructing illumination intensity fields and supplying refined residual queries to the BEV context refinement process. Extensive experiments on both real and synthetic datasets demonstrate the superior performance of LIAR under challenging nighttime scenarios. The source code and pretrained models are available \href{this https URL}{here}.
占用预测旨在估算被占据区域的3D空间分布及其相应的语义标签。现有的基于视觉的方法在白天基准测试中表现出色,但在夜间场景中由于能见度低和照明条件恶劣而面临挑战。为了解决这些问题,我们提出了**LIAR**(光照适应表示学习框架),该框架通过利用白天空间场景中的光照先验来改进夜间图像的处理。 具体来说,LIAR首先引入了选择性弱光图像增强(SLLIE),它可以根据白天场景的光照先验自适应地判断夜间图像是否真正处于黑暗状态或照明充足,并进行针对性的整体提升。基于SLLIE生成的光照图,LIAR进一步整合了两个光照感知组件:2D光照引导采样(2D-IGS)和3D光照驱动投影(3D-IDP),以分别解决局部曝光不足与过度曝光的问题。 具体而言,2D-IGS根据光照地图调节特征抽取的位置,在较暗的区域赋予较大的偏移值而在明亮区域给予较小的偏移值,从而缓解在曝光不足区域中的特征退化。随后,3D-IDP通过构建光照强度场和为BEV上下文细化过程提供精炼残差查询来提升过度曝光区域中的语义理解。 一系列在真实和合成数据集上的广泛实验表明,在具有挑战性的夜间场景中,LIAR展现出了卓越的性能。该框架的源代码和预训练模型可以在这里获取(原文链接)。
https://arxiv.org/abs/2505.20641
Underwater images are often affected by complex degradations such as light absorption, scattering, color casts, and artifacts, making enhancement critical for effective object detection, recognition, and scene understanding in aquatic environments. Existing methods, especially diffusion-based approaches, typically rely on synthetic paired datasets due to the scarcity of real underwater references, introducing bias and limiting generalization. Furthermore, fine-tuning these models can degrade learned priors, resulting in unrealistic enhancements due to domain shifts. To address these challenges, we propose UDAN-CLIP, an image-to-image diffusion framework pre-trained on synthetic underwater datasets and enhanced with a customized classifier based on vision-language model, a spatial attention module, and a novel CLIP-Diffusion loss. The classifier preserves natural in-air priors and semantically guides the diffusion process, while the spatial attention module focuses on correcting localized degradations such as haze and low contrast. The proposed CLIP-Diffusion loss further strengthens visual-textual alignment and helps maintain semantic consistency during enhancement. The proposed contributions empower our UDAN-CLIP model to perform more effective underwater image enhancement, producing results that are not only visually compelling but also more realistic and detail-preserving. These improvements are consistently validated through both quantitative metrics and qualitative visual comparisons, demonstrating the model's ability to correct distortions and restore natural appearance in challenging underwater conditions.
水下图像通常会受到光吸收、散射、色偏和伪影等复杂退化的严重影响,这使得增强对于有效物体检测、识别和场景理解至关重要。现有的方法,尤其是基于扩散的方法,由于缺乏真实的水下参考数据集而依赖于合成配对的数据集,从而引入了偏差并限制了泛化能力。此外,在进行微调时,这些模型会破坏已学习的先验知识,导致因领域偏移而导致不真实的增强效果。为了解决这些问题,我们提出了UDAN-CLIP,这是一种基于图像到图像扩散框架的方法,它在合成水下数据集上进行了预训练,并通过一个基于视觉语言模型的定制分类器、空间注意力模块以及一种新颖的CLIP-Diffusion损失函数得到了改进。该分类器保留了自然空气中的先验知识并为扩散过程提供了语义引导,而空间注意力模块则专注于纠正诸如雾气和低对比度等局部退化现象。所提出的CLIP-Diffusion损失进一步增强了视觉文本对齐,并有助于在增强过程中保持语义一致性。我们的贡献使UDAN-CLIP模型能够更有效地进行水下图像增强,不仅产生视觉上吸引人的结果,而且更加真实并保留了细节。这些改进通过定量指标和定性视觉对比得到了一致验证,展示了该模型能够在具有挑战性的水下条件下纠正失真并恢复自然外观的能力。
https://arxiv.org/abs/2505.19895
Novel view synthesis in 360$^\circ$ scenes from extremely sparse input views is essential for applications like virtual reality and augmented reality. This paper presents a novel framework for novel view synthesis in extremely sparse-view cases. As typical structure-from-motion methods are unable to estimate camera poses in extremely sparse-view cases, we apply DUSt3R to estimate camera poses and generate a dense point cloud. Using the poses of estimated cameras, we densely sample additional views from the upper hemisphere space of the scenes, from which we render synthetic images together with the point cloud. Training 3D Gaussian Splatting model on a combination of reference images from sparse views and densely sampled synthetic images allows a larger scene coverage in 3D space, addressing the overfitting challenge due to the limited input in sparse-view cases. Retraining a diffusion-based image enhancement model on our created dataset, we further improve the quality of the point-cloud-rendered images by removing artifacts. We compare our framework with benchmark methods in cases of only four input views, demonstrating significant improvement in novel view synthesis under extremely sparse-view conditions for 360$^\circ$ scenes.
在虚拟现实和增强现实中,从极其稀疏的输入视图合成360°场景的新视角是至关重要的。本文提出了一种用于极稀疏视图情况下新视角合成的新型框架。 传统的基于结构从运动(Structure-from-Motion, SfM)的方法无法估计极端稀疏视图情况下的相机姿态,为此我们应用了DUSt3R方法来估算相机的姿态并生成密集点云。利用已估计算法的相机位置,我们在场景的上半球空间内密集采样额外视角,并与点云一起渲染合成图像。 通过在来自稀疏视图的参考图像和密集采样的合成图像组合上训练三维高斯光栅(3D Gaussian Splatting)模型,可以在三维空间中实现更广泛的场景覆盖,从而解决由于输入数据不足而导致极稀疏视图情况下的过拟合挑战。通过重新训练基于扩散的方法来改进我们创建的数据集上的图像质量,并进一步消除由点云渲染产生的伪影。 在仅有四个输入视角的情况下,我们将我们的框架与基准方法进行了比较,在极端稀疏的360°场景新视角合成方面显示出显著改善。
https://arxiv.org/abs/2505.19264
All-in-one image restoration aims to recover clear images from various degradation types and levels with a unified model. Nonetheless, the significant variations among degradation types present challenges for training a universal model, often resulting in task interference, where the gradient update directions of different tasks may diverge due to shared parameters. To address this issue, motivated by the routing strategy, we propose DFPIR, a novel all-in-one image restorer that introduces Degradation-aware Feature Perturbations(DFP) to adjust the feature space to align with the unified parameter space. In this paper, the feature perturbations primarily include channel-wise perturbations and attention-wise perturbations. Specifically, channel-wise perturbations are implemented by shuffling the channels in high-dimensional space guided by degradation types, while attention-wise perturbations are achieved through selective masking in the attention space. To achieve these goals, we propose a Degradation-Guided Perturbation Block (DGPB) to implement these two functions, positioned between the encoding and decoding stages of the encoder-decoder architecture. Extensive experimental results demonstrate that DFPIR achieves state-of-the-art performance on several all-in-one image restoration tasks including image denoising, image dehazing, image deraining, motion deblurring, and low-light image enhancement. Our codes are available at this https URL.
全图像复原旨在利用统一的模型从各种退化类型和程度中恢复清晰图像。然而,不同退化类型的显著差异为训练通用模型带来了挑战,通常会导致任务干扰问题——由于共享参数,不同任务的梯度更新方向可能会发生分歧。为了应对这一问题,受路由策略启发,我们提出了DFPIR(Degradation-aware Feature Perturbations for Image Restoration),这是一种新的全图像复原方法,通过引入退化感知特征扰动(DFP)来调整特征空间以适应统一参数空间。 在本文中,特征扰动主要包括通道级扰动和注意力级扰动。具体而言,通道级扰动是通过根据不同的退化类型引导高维空间中的通道洗牌实现的;而注意力级扰动则是通过对注意力空间进行选择性屏蔽来完成的。为了达成这些目标,我们设计了一种退化导向的扰动模块(DGPB),用于实现在编码器-解码器架构的编码和解码阶段之间的这两种功能。 广泛的实验结果表明,DFPIR在包括图像去噪、图像去雾、图像除雨、运动模糊恢复以及低光照图像增强在内的几个全图像复原任务中达到了最先进的性能。我们的代码可在提供的链接地址获取。
https://arxiv.org/abs/2505.12630
Image enhancement methods often prioritize pixel level information, overlooking the semantic features. We propose a novel, unsupervised, fuzzy-inspired image enhancement framework guided by NSGA-II algorithm that optimizes image brightness, contrast, and gamma parameters to achieve a balance between visual quality and semantic fidelity. Central to our proposed method is the use of a pre trained deep neural network as a feature extractor. To find the best enhancement settings, we use a GPU-accelerated NSGA-II algorithm that balances multiple objectives, namely, increasing image entropy, improving perceptual similarity, and maintaining appropriate brightness. We further improve the results by applying a local search phase to fine-tune the top candidates from the genetic algorithm. Our approach operates entirely without paired training data making it broadly applicable across domains with limited or noisy labels. Quantitatively, our model achieves excellent performance with average BRISQUE and NIQE scores of 19.82 and 3.652, respectively, in all unpaired datasets. Qualitatively, enhanced images by our model exhibit significantly improved visibility in shadowed regions, natural balance of contrast and also preserve the richer fine detail without introducing noticable artifacts. This work opens new directions for unsupervised image enhancement where semantic consistency is critical.
图像增强方法往往侧重于像素级别的信息,而忽视了语义特征。我们提出了一种新颖的、无监督的、受模糊理论启发的图像增强框架,该框架由NSGA-II算法引导,并优化图像亮度、对比度和伽马参数,以实现视觉质量和语义保真度之间的平衡。我们的方法核心在于使用一个预训练的深度神经网络作为特征提取器。为了找到最佳的增强设置,我们利用了GPU加速的NSGA-II算法,该算法在增加图像熵、提高感知相似性以及保持适当亮度等多重目标之间进行权衡。我们进一步通过应用局部搜索阶段来微调遗传算法产生的顶级候选者,从而改进结果。 我们的方法完全不需要配对训练数据,在标签有限或嘈杂的各种领域中具有广泛的应用潜力。从定量角度看,我们的模型在所有无配对的数据集中取得了优秀的性能,BRISQUE和NIQE的平均分数分别为19.82和3.652。从定性角度来看,经过我们模型增强后的图像在阴影区域的可见度得到了显著改善,对比度自然平衡,并且保留了更丰富的细节,同时未引入明显的伪影。 这项工作为无监督图像增强开辟了新的方向,在这种情况下,语义一致性至关重要。
https://arxiv.org/abs/2505.11246
This study introduces an enhanced approach to video super-resolution by extending ordinary Single-Image Super-Resolution (SISR) Super-Resolution Generative Adversarial Network (SRGAN) structure to handle spatio-temporal data. While SRGAN has proven effective for single-image enhancement, its design does not account for the temporal continuity required in video processing. To address this, a modified framework that incorporates 3D Non-Local Blocks is proposed, which is enabling the model to capture relationships across both spatial and temporal dimensions. An experimental training pipeline is developed, based on patch-wise learning and advanced data degradation techniques, to simulate real-world video conditions and learn from both local and global structures and details. This helps the model generalize better and maintain stability across varying video content while maintaining the general structure besides the pixel-wise correctness. Two model variants-one larger and one more lightweight-are presented to explore the trade-offs between performance and efficiency. The results demonstrate improved temporal coherence, sharper textures, and fewer visual artifacts compared to traditional single-image methods. This work contributes to the development of practical, learning-based solutions for video enhancement tasks, with potential applications in streaming, gaming, and digital restoration.
这项研究提出了一种改进的方法,通过将普通的单图像超分辨率(SISR)生成对抗网络(SRGAN)结构扩展到处理时空数据来增强视频超分辨率。虽然SRGAN在单一图像增强方面表现出色,但其设计并未考虑视频处理中所需的时序连续性。为解决这一问题,提出了一个修改后的框架,该框架引入了3D非局部块,使模型能够捕捉空间和时间维度上的关系。 为了模拟现实世界的视频条件并从局部和全局结构及细节中学习,开发了一种基于分片式学习和高级数据退化技术的实验性训练流程。这有助于模型更好地泛化,并在处理不同类型的视频内容时保持稳定性,同时维持整体结构以及像素级别的准确性。 本研究提出了两种模型变体——一种较大、另一种较轻量级的,以探索性能与效率之间的权衡。结果表明,在时序一致性、更清晰的纹理和较少视觉伪影方面,相比传统的单图像方法有显著改进。这项工作为视频增强任务开发实用的学习型解决方案做出了贡献,并可能在流媒体、游戏和数字修复等领域得到应用。
https://arxiv.org/abs/2505.10589
Low-light image enhancement (LLIE) is a fundamental task in computational photography, aiming to improve illumination, reduce noise, and enhance image quality. While recent advancements focus on designing increasingly complex neural network models, we observe a peculiar phenomenon: resetting certain parameters to random values unexpectedly improves enhancement performance for some images. Drawing inspiration from biological genes, we term this phenomenon the gene effect. The gene effect limits enhancement performance, as even random parameters can sometimes outperform learned ones, preventing models from fully utilizing their capacity. In this paper, we investigate the reason and propose a solution. Based on our observations, we attribute the gene effect to static parameters, analogous to how fixed genetic configurations become maladaptive when environments change. Inspired by biological evolution, where adaptation to new environments relies on gene mutation and recombination, we propose parameter dynamic evolution (PDE) to adapt to different images and mitigate the gene effect. PDE employs a parameter orthogonal generation technique and the corresponding generated parameters to simulate gene recombination and gene mutation, separately. Experiments validate the effectiveness of our techniques. The code will be released to the public.
低光照图像增强(LLIE)是计算摄影中的一个基本任务,旨在提高照明、减少噪声并提升图像质量。尽管最近的研究重点在于设计越来越复杂的神经网络模型,但我们观察到一种奇特的现象:将某些参数重置为随机值有时会意外地改善部分图像的增强效果。受到生物学基因概念的启发,我们将这一现象称为“基因效应”。基因效应限制了增强性能,因为即使随机参数也有可能优于学习得到的参数,阻碍模型充分发挥其潜力。 在本文中,我们探讨了产生这种现象的原因,并提出了解决方案。根据我们的观察,我们认为基因效应是由静态参数引起的,就像固定不变的遗传配置会在环境变化时变得适应性差一样。受生物进化启发,即适应新环境需要通过基因突变和重组来实现,我们提出了动态参数演化(PDE)方法以适应不同的图像并缓解基因效应。 PDE采用了一种参数正交生成技术以及对应生成的参数,分别模拟了基因重组与基因突变的过程。实验验证了我们的技术的有效性。代码将公开发布。
https://arxiv.org/abs/2505.09196
In low-light environments, the performance of computer vision algorithms often deteriorates significantly, adversely affecting key vision tasks such as segmentation, detection, and classification. With the rapid advancement of deep learning, its application to low-light image processing has attracted widespread attention and seen significant progress in recent years. However, there remains a lack of comprehensive surveys that systematically examine how recent deep-learning-based low-light image enhancement methods function and evaluate their effectiveness in enhancing downstream vison tasks. To address this gap, this review provides a detailed elaboration on how various recent approaches (from 2020) operate and their enhancement mechanisms, supplemented with clear illustrations. It also investigates the impact of different enhancement techniques on subsequent vision tasks, critically analyzing their strengths and limitations. Additionally, it proposes future research directions. This review serves as a useful reference for determining low-light image enhancement techniques and optimizing vision task performance in low-light conditions.
在低光环境下,计算机视觉算法的性能往往显著下降,严重影响了分割、检测和分类等关键任务的表现。随着深度学习的迅速发展,其在低光图像处理领域的应用吸引了广泛的关注,并在过去几年中取得了重大进展。然而,目前仍缺乏全面且系统地评估最近基于深度学习的低光图像增强方法的研究综述,这些方法的功能及其对下游视觉任务效果的影响尚未得到充分探讨。为填补这一空白,本文详细阐述了2020年以来各种最新方法的工作原理及它们的增强机制,并附有清晰的图示说明。此外,本文还研究了不同增强技术对后续视觉任务的影响,对其优缺点进行了批判性分析,并提出了未来的研究方向。这篇综述对于确定低光图像增强技术以及在低光条件下优化视觉任务性能具有重要的参考价值。
https://arxiv.org/abs/2505.05759
Alzheimer's Disease (AD) is a neurodegenerative disorder characterized by amyloid-beta plaques and tau neurofibrillary tangles, which serve as key histopathological features. The identification and segmentation of these lesions are crucial for understanding AD progression but remain challenging due to the lack of large-scale annotated datasets and the impact of staining variations on automated image analysis. Deep learning has emerged as a powerful tool for pathology image segmentation; however, model performance is significantly influenced by variations in staining characteristics, necessitating effective stain normalization and enhancement techniques. In this study, we address these challenges by introducing an open-source dataset (ADNP-15) of neuritic plaques (i.e., amyloid deposits combined with a crown of dystrophic tau-positive neurites) in human brain whole slide images. We establish a comprehensive benchmark by evaluating five widely adopted deep learning models across four stain normalization techniques, providing deeper insights into their influence on neuritic plaque segmentation. Additionally, we propose a novel image enhancement method that improves segmentation accuracy, particularly in complex tissue structures, by enhancing structural details and mitigating staining inconsistencies. Our experimental results demonstrate that this enhancement strategy significantly boosts model generalization and segmentation accuracy. All datasets and code are open-source, ensuring transparency and reproducibility while enabling further advancements in the field.
阿尔茨海默病(AD)是一种神经退行性疾病,其特征是β淀粉样蛋白斑块和tau神经原纤维缠结的存在。这些病变被认为是该疾病的标志性病理学特征,在理解AD进展中具有关键作用。然而,由于缺乏大规模标注数据集以及染色变化对自动化图像分析的影响,识别和分割这些病变仍然极具挑战性。 深度学习技术因其在病理性图像分割中的强大能力而崭露头角,但模型性能受到染色特性变异的显著影响,这需要有效的染色校准与增强技术。在这项研究中,我们通过引入一个开放源代码数据集(ADNP-15),解决了这些问题,该数据集包含了人类大脑全切片图像中的神经原纤维缠结斑块(即淀粉样蛋白沉积物伴有变性tau阳性的神经原结构冠)。我们在四种染色校准技术上评估了五种广泛采用的深度学习模型,建立了全面的基准测试,从而更深入地了解这些技术对神经原纤维缠结斑块分割的影响。此外,我们提出了一种新的图像增强方法,通过提升复杂组织结构中的细节和减少染色不一致性来提高分割准确性。 我们的实验结果表明,该增强策略显著提高了模型的泛化能力以及分割精度。所有数据集及代码均开放源码以确保透明度与可重复性,并促进相关领域进一步的发展。
https://arxiv.org/abs/2505.05041
This paper presents a novel Two-Stage Diffusion Model (TS-Diff) for enhancing extremely low-light RAW images. In the pre-training stage, TS-Diff synthesizes noisy images by constructing multiple virtual cameras based on a noise space. Camera Feature Integration (CFI) modules are then designed to enable the model to learn generalizable features across diverse virtual cameras. During the aligning stage, CFIs are averaged to create a target-specific CFI$^T$, which is fine-tuned using a small amount of real RAW data to adapt to the noise characteristics of specific cameras. A structural reparameterization technique further simplifies CFI$^T$ for efficient deployment. To address color shifts during the diffusion process, a color corrector is introduced to ensure color consistency by dynamically adjusting global color distributions. Additionally, a novel dataset, QID, is constructed, featuring quantifiable illumination levels and a wide dynamic range, providing a comprehensive benchmark for training and evaluation under extreme low-light conditions. Experimental results demonstrate that TS-Diff achieves state-of-the-art performance on multiple datasets, including QID, SID, and ELD, excelling in denoising, generalization, and color consistency across various cameras and illumination levels. These findings highlight the robustness and versatility of TS-Diff, making it a practical solution for low-light imaging applications. Source codes and models are available at this https URL
这篇论文提出了一种新颖的两阶段扩散模型(TS-Diff),用于增强极低光照条件下的RAW图像。在预训练阶段,TS-Diff通过构建基于噪声空间的多个虚拟相机来合成带噪图像,并设计了相机特征集成(CFI)模块以使模型能够学习跨不同虚拟相机的一般化特性。在对齐阶段,平均计算出针对特定目标的CFI$^T$,并通过少量的真实RAW数据进行微调,使其适应特定相机的噪声特点。此外,还引入了一种结构重构技术来简化CFI$^T$,以便于高效部署。为了应对扩散过程中出现的颜色偏移问题,论文中设计了一个颜色校正器,通过动态调整全局颜色分布来确保色彩的一致性。 为解决低光照条件下的训练和评估需求,构造了一个具有可量化照明级别及宽广动态范围的新数据集QID,提供了全面的基准测试环境。实验结果显示,在包括QID、SID和ELD在内的多个数据集中,TS-Diff在去噪能力、泛化能力和颜色一致性方面表现优异,横跨不同相机型号与光照条件均能发挥稳定性能。这些发现突显了TS-Diff模型的强大适应性和实用性,使其成为极低光照成像应用中的理想解决方案。 源代码和模型可在以下网址获得:[https URL](注意实际使用时需替换为正确的URL)。
https://arxiv.org/abs/2505.04281