The preservation of the Amazon Rainforest is one of the global priorities in combating climate change, protecting biodiversity, and safeguarding indigenous cultures. The Satellite-based Monitoring Project of Deforestation in the Brazilian Legal Amazon (PRODES), a project of the National Institute for Space Research (INPE), stands out as a fundamental initiative in this effort, annually monitoring deforested areas not only in the Amazon but also in other Brazilian biomes. Recently, machine learning models have been developed using PRODES data to support this effort through the comparative analysis of multitemporal satellite images, treating deforestation detection as a change detection problem. However, existing approaches present significant limitations: models evaluated in the literature still show unsatisfactory effectiveness, many do not incorporate modern architectures, such as those based on self-attention mechanisms, and there is a lack of methodological standardization that allows direct comparisons between different studies. In this work, we address these gaps by evaluating various change detection models in a unified dataset, including fully convolutional models and networks incorporating self-attention mechanisms based on Transformers. We investigate the impact of different pre- and post-processing techniques, such as filtering deforested areas predicted by the models based on the size of connected components, texture replacement, and image enhancements; we demonstrate that such approaches can significantly improve individual model effectiveness. Additionally, we test different strategies for combining the evaluated models to achieve results superior to those obtained individually, reaching an F1-score of 80.41%, a value comparable to other recent works in the literature.
亚马逊雨林的保护是应对气候变化、保护生物多样性以及守护土著文化的一项全球性优先任务。巴西法律亚马逊地区的毁林卫星监测项目(PRODES)是由国家空间研究所(INPE)发起的一个关键举措,该项目通过年度监测来跟踪整个亚马孙地区及其他巴西生物群落中的毁林情况。最近,研究人员利用PRODES数据开发了机器学习模型,以支持这项工作,通过对多时间序列的卫星图像进行比较分析,将森林砍伐检测视为一种变化检测问题。然而,现有方法仍存在显著限制:文献中评估过的模型效果并不令人满意;许多模型没有采用现代架构,如基于自注意力机制的架构;而且缺乏一种允许不同研究之间直接对比的方法论标准化。 在本文工作中,我们通过在统一的数据集中评估各种变化检测模型来解决这些问题,这些模型包括完全卷积模型以及基于变换器(Transformers)并融入了自注意力机制的网络。我们探讨了不同的预处理和后处理技术对模型效果的影响,例如根据模型预测的毁林区域大小进行滤除、纹理替换及图像增强;我们展示了这些方法可以显著提升单一模型的效果。 此外,我们测试了将评估过的不同模型组合以获得优于单个模型表现的不同策略,并达到了80.41%的F1分数,在其他最近的相关文献中这一数值具有可比性。
https://arxiv.org/abs/2512.08075
Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scattering, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhancement, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Experiments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application.
腹腔镜手术依赖于术中视频,因此图像质量是决定手术安全性和有效性的关键因素。然而,内窥镜视频常常受到不均匀照明、组织散射、遮挡和运动模糊等因素的影响,这些都会掩盖重要解剖细节并使手术操作复杂化。尽管基于深度学习的方法在图像增强方面显示出巨大潜力,但大多数现有方法仍然过于计算密集型而无法实现实时手术应用。为了解决这一挑战,我们提出了一种针对内窥镜视频增强的降级感知框架,该框架能够通过跨帧传播降级表示实现实时、高质量的增强。 在我们的框架中,首先使用对比学习从图像中提取降级表示。接着引入一种融合机制,利用这些表示调节图像特征,并指导单帧增强模型进行操作。我们通过在受损和恢复图像之间施加循环一致性约束来训练这个模型,以提高其鲁棒性和泛化能力。 实验表明,我们的框架与几种最先进的方法相比,在性能和效率上取得了更优的平衡。这一结果突显了降级感知建模在实时内窥镜视频增强中的有效性。然而,我们提出的方法暗示通过隐式学习并传播降级表示可以为临床应用提供实用途径。
https://arxiv.org/abs/2512.07253
Image enhancement improves visual quality and helps reveal details that are hard to see in the original image. In medical imaging, it can support clinical decision-making, but current models often over-edit. This can distort organs, create false findings, and miss small tumors because these models do not understand anatomy or contrast dynamics. We propose SMILE, an anatomy-aware diffusion model that learns how organs are shaped and how they take up contrast. It enhances only clinically relevant regions while leaving all other areas unchanged. SMILE introduces three key ideas: (1) structure-aware supervision that follows true organ boundaries and contrast patterns; (2) registration-free learning that works directly with unaligned multi-phase CT scans; (3) unified inference that provides fast and consistent enhancement across all contrast phases. Across six external datasets, SMILE outperforms existing methods in image quality (14.2% higher SSIM, 20.6% higher PSNR, 50% better FID) and in clinical usefulness by producing anatomically accurate and diagnostically meaningful images. SMILE also improves cancer detection from non-contrast CT, raising the F1 score by up to 10 percent.
图像增强可以提高视觉质量,并帮助揭示原图中难以察觉的细节。在医学成像领域,它可以支持临床决策制定,但目前的模型常常过度编辑。这会导致器官变形、产生假阳性结果以及遗漏微小肿瘤,因为这些模型不理解解剖结构或对比度变化动态。我们提出了SMILE,这是一种了解器官形状和对比度吸收方式的解剖学感知扩散模型。它仅增强临床相关的区域,而其他所有区域保持不变。 SMILE引入了三个关键概念:(1)结构感知监督,遵循真正的器官边界和对比模式;(2)无需配准的学习,在不需对齐的情况下直接处理多相CT扫描;(3)统一推理,提供快速且一致的跨所有对比度阶段图像增强效果。在六个外部数据集中,SMILE在图像质量方面超越了现有方法(SSIM提高14.2%,PSNR提高20.6%,FID改善50%),并且通过生成解剖学准确和诊断有意义的图像提高了临床实用性。此外,SMILE还能提升非对比CT检测癌症的能力,最高可将F1分数提高10个百分点。
https://arxiv.org/abs/2512.07251
Underwater images often suffer from severe color distortion, low contrast, and a hazy appearance due to wavelength-dependent light absorption and scattering. Simultaneously, existing deep learning models exhibit high computational complexity, which limits their practical deployment for real-time underwater applications. To address these challenges, this paper presents a novel underwater image enhancement model, called Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net). It integrates a residual encoder decoder with dual auxiliary branches, which operate in the frequency and illumination domains. The frequency fusion encoder enriches spatial representations with frequency cues from the Fourier domain and preserves fine textures and structural details. Inspired by Retinex, the illumination-aware decoder performs adaptive exposure correction through a learned illumination map that separates reflectance from lighting effects. This joint spatial, frequency, and illumination design enables the model to restore color balance, visual contrast, and perceptual realism under diverse underwater conditions. Additionally, we present a high-resolution, real-world underwater video-derived dataset from the Mediterranean Sea, which captures challenging deep-sea conditions with realistic visual degradations to enable robust evaluation and development of deep learning models. Extensive experiments on multiple benchmark datasets show that AQUA-Net performs on par with SOTA in both qualitative and quantitative evaluations while using less number of parameters. Ablation studies further confirm that the frequency and illumination branches provide complementary contributions that improve visibility and color representation. Overall, the proposed model shows strong generalization capability and robustness, and it provides an effective solution for real-world underwater imaging applications.
水下图像常常因为波长依赖的光吸收和散射而出现严重的色彩失真、低对比度及模糊的现象。同时,现有的深度学习模型由于计算复杂性高,在实时水下应用中部署受到限制。为了解决这些问题,本文提出了一种新型的水下图像增强模型,称为自适应频率融合与光照感知网络(AQUA-Net)。该模型结合了残差编码器解码器以及两个辅助分支,分别在频率和光照领域运作。频率融合编码器通过傅里叶域中的频率线索丰富空间表示,并保留精细纹理和结构细节。受Retinex启发,光照感知解码器通过学习到的光照图执行自适应曝光校正,从而分离反射率与照明效果。这种结合了空间、频率和光照的设计使模型能够在各种水下条件下恢复色彩平衡、视觉对比度和感官真实性。 此外,我们还提供了一个高分辨率的真实世界地中海海底视频衍生数据集,该数据集捕捉到了具有挑战性的深海条件以及现实的视觉退化情况,从而能够进行稳健的评估和深度学习模型的发展。在多个基准数据集上的广泛实验表明,在定性和定量评价中,AQUA-Net的表现与最先进方法相当,但参数数量较少。消融研究进一步证实了频率和光照分支提供的互补贡献有助于提高可见度和色彩表现。 总体而言,所提出的模型展示了强大的泛化能力和稳健性,并为实际的水下成像应用提供了一个有效的解决方案。
https://arxiv.org/abs/2512.05960
Diffusion models have achieved remarkable success in low-light image enhancement through Retinex-based decomposition, yet their requirement for hundreds of iterative sampling steps severely limits practical deployment. While recent consistency models offer promising one-step generation for \textit{unconditional synthesis}, their application to \textit{conditional enhancement} remains unexplored. We present \textbf{Consist-Retinex}, the first framework adapting consistency modeling to Retinex-based low-light enhancement. Our key insight is that conditional enhancement requires fundamentally different training dynamics than unconditional generation standard consistency training focuses on low-noise regions near the data manifold, while conditional mapping critically depends on large-noise regimes that bridge degraded inputs to enhanced outputs. We introduce two core innovations: (1) a \textbf{dual-objective consistency loss} combining temporal consistency with ground-truth alignment under randomized time sampling, providing full-spectrum supervision for stable convergence; and (2) an \textbf{adaptive noise-emphasized sampling strategy} that prioritizes training on large-noise regions essential for one-step conditional generation. On VE-LOL-L, Consist-Retinex achieves \textbf{state-of-the-art performance with single-step sampling} (\textbf{PSNR: 25.51 vs. 23.41, FID: 44.73 vs. 49.59} compared to Diff-Retinex++), while requiring only \textbf{1/8 of the training budget} relative to the 1000-step Diff-Retinex baseline.
扩散模型在通过基于Retinex的分解进行低光图像增强方面取得了显著的成功,然而其需要数百次迭代采样步骤的要求严重限制了其实用部署。虽然最近的一致性模型为无条件合成提供了一步生成的有希望的方法,但它们在有条件增强方面的应用尚未被探索。我们提出了**Consist-Retinex**,这是第一个将一致性建模适应到基于Retinex的低光增强框架中的方法。 我们的关键见解是:与标准的一致性训练专注于数据流形附近的低噪声区域不同,条件映射的关键在于连接退化输入和增强输出的大噪声区域。为此,我们提出了两个核心创新: 1. **双目标一致性损失**:结合了时间一致性和在随机化时间采样下与真实标签对齐的损失函数,为稳定收敛提供了全面监督。 2. **自适应噪声强调采样策略**:优先训练对于一步条件生成至关重要的大噪声区域。 在VE-LOL-L数据集上,Consist-Retinex实现了单步采样的最先进的性能(与Diff-Retinex++相比,PSNR: 25.51 vs. 23.41, FID: 44.73 vs. 49.59),同时相对于需要1000步的Diff-Retinex基线,仅需**训练预算的八分之一**。
https://arxiv.org/abs/2512.08982
Underwater video pairs are fairly difficult to obtain due to the complex underwater imaging. In this case, most existing video underwater enhancement methods are performed by directly applying the single-image enhancement model frame by frame, but a natural issue is lacking temporal consistency. To relieve the problem, we rethink the temporal manifold inherent in natural videos and observe a temporal consistency prior in dynamic scenes from the local temporal frequency perspective. Building upon the specific prior and no paired-data condition, we propose an implicit representation manner for enhanced video signals, which is conducted in the wavelet-based temporal consistency field, WaterWave. Specifically, under the constraints of the prior, we progressively filter and attenuate the inconsistent components while preserving motion details and scenes, achieving a natural-flowing video. Furthermore, to represent temporal frequency bands more accurately, an underwater flow correction module is designed to rectify estimated flows considering the transmission in underwater scenes. Extensive experiments demonstrate that WaterWave significantly enhances the quality of videos generated using single-image underwater enhancements. Additionally, our method demonstrates high potential in downstream underwater tracking tasks, such as UOSTrack and MAT, outperforming the original video by a large margin, i.e., 19.7% and 9.7% on precise respectively.
由于水下成像的复杂性,获取高质量的水下视频配对数据相当困难。在这种情况下,大多数现有的水下视频增强方法都是通过直接在每一帧上应用单图像增强模型来实现的,但这样会产生缺乏时间一致性的问题。为了解决这个问题,我们重新审视了自然视频中固有的时间流形,并从局部时间频率的角度观察到了动态场景中的时间一致性的先验知识。基于这种特定的先验和未配对数据的情况,我们提出了一种在基于小波的时间一致性场(WaterWave)内进行增强视频信号隐式表示的方法。具体来说,在遵循先验约束的情况下,我们逐步过滤并削弱不一致的成分,同时保留运动细节和场景,从而生成自然流畅的视频。 为了更准确地表示时间频率带,设计了一个水下流修正模块来纠正估计的流动情况,考虑了水下场景中的传输特性。大量的实验表明,WaterWave显著提高了使用单图像增强方法生成的视频质量。此外,我们的方法在下游的水下跟踪任务(如UOSTrack和MAT)中表现出很高的潜力,并且大幅优于原始视频,在精度上分别高出19.7%和9.7%。
https://arxiv.org/abs/2512.05492
When applied sequentially to video, frame-based networks often exhibit temporal inconsistency - for example, outputs that flicker between frames. This problem is amplified when the network inputs contain time-varying corruptions. In this work, we introduce a general approach for adapting frame-based models for stable and robust inference on video. We describe a class of stability adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. By analyzing the theoretical properties of this loss, we identify the conditions where it produces well-behaved stabilizer training. Our experiments validate our approach on several vision tasks including denoising (NAFNet), image enhancement (HDRNet), monocular depth (Depth Anything v2), and semantic segmentation (DeepLabv3+). Our method improves temporal stability and robustness against a range of image corruptions (including compression artifacts, noise, and adverse weather), while preserving or improving the quality of predictions.
当顺序应用于视频时,基于帧的网络经常表现出时间上的一致性问题——例如,在帧之间闪烁的输出。当网络输入包含随时间变化的干扰时,这个问题会被放大。在这项工作中,我们介绍了一种通用方法来适应基于帧的模型,以实现对视频进行稳定和鲁棒推断。我们描述了一类可以插入几乎任何架构中的稳定性适配器,以及一个资源高效的训练过程,该过程可以在冻结的基础网络上完成。我们引入了一个统一的概念框架来描述时间上的稳定性和抗干扰性,并围绕所提出的精度-稳定-健壮损失(accuracy-stability-robustness loss)进行构建。通过分析此损失的理论性质,我们识别了它在产生良好行为的稳定性训练条件下的情况。我们的实验验证了我们在包括去噪(NAFNet)、图像增强(HDRNet)、单目深度估计(Depth Anything v2)和语义分割(DeepLabv3+)等几个视觉任务上的方法的有效性。我们的方法改善了一系列图像干扰下的时间稳定性和鲁棒性(包括压缩伪影、噪声以及恶劣天气),同时保持或提高预测的质量。
https://arxiv.org/abs/2512.03014
In low-light environments like nighttime driving, image degradation severely challenges in-vehicle camera safety. Since existing enhancement algorithms are often too computationally intensive for vehicular applications, we propose UltraFast-LieNET, a lightweight multi-scale shifted convolutional network for real-time low-light image enhancement. We introduce a Dynamic Shifted Convolution (DSConv) kernel with only 12 learnable parameters for efficient feature extraction. By integrating DSConv with varying shift distances, a Multi-scale Shifted Residual Block (MSRB) is constructed to significantly expand the receptive field. To mitigate lightweight network instability, a residual structure and a novel multi-level gradient-aware loss function are incorporated. UltraFast-LieNET allows flexible parameter configuration, with a minimum size of only 36 parameters. Results on the LOLI-Street dataset show a PSNR of 26.51 dB, outperforming state-of-the-art methods by 4.6 dB while utilizing only 180 parameters. Experiments across four benchmark datasets validate its superior balance of real-time performance and enhancement quality under limited resources. Code is available at https://githubhttps://github.com/YuhanChen2024/UltraFast-LiNET
在低光环境(如夜间驾驶)中,图像质量的下降严重挑战了车载摄像头的安全性。由于现有的增强算法通常对车辆应用来说计算量过大,我们提出了一种名为UltraFast-LieNET的轻量化多尺度移位卷积网络,用于实现实时低光照下的图像增强。我们引入了一个仅包含12个可学习参数的动态移位卷积(DSConv)核,以实现高效的特征提取。通过将具有不同偏移距离的DSConv集成在一起,构建了多尺度移位残差块(MSRB),从而显著扩大感受野范围。为了缓解轻量级网络不稳定的问题,我们还引入了一个残差结构和一种新颖的多层次梯度感知损失函数。 UltraFast-LieNET允许灵活配置参数,最小规模仅为36个参数。在LOLI-Street数据集上的实验结果表明,在仅使用180个参数的情况下,PSNR达到了26.51 dB,超越了最先进的方法4.6 dB。跨四个基准数据集的实验验证了它在有限资源下的实时性能与增强质量之间的出色平衡。 代码可以在GitHub上找到:https://github.com/YuhanChen2024/UltraFast-LiNET
https://arxiv.org/abs/2512.02965
Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressing dense pointmaps from image pairs enables accurate and efficient pose-free reconstruction. However, existing RGB-only approaches struggle under real-world conditions involving dynamic objects and extreme illumination, due to the inherent limitations of conventional cameras. In this paper, we propose EAG3R, a novel geometry estimation framework that augments pointmap-based reconstruction with asynchronous event streams. Built upon the MonST3R backbone, EAG3R introduces two key innovations: (1) a retinex-inspired image enhancement module and a lightweight event adapter with SNR-aware fusion mechanism that adaptively combines RGB and event features based on local reliability; and (2) a novel event-based photometric consistency loss that reinforces spatiotemporal coherence during global optimization. Our method enables robust geometry estimation in challenging dynamic low-light scenes without requiring retraining on night-time data. Extensive experiments demonstrate that EAG3R significantly outperforms state-of-the-art RGB-only baselines across monocular depth estimation, camera pose tracking, and dynamic reconstruction tasks.
从视频中稳健地估计三维几何对于自主导航、SLAM(同步定位与地图构建)和三维场景重建等应用至关重要。最近的方法如DUSt3R表明,通过图像对回归密集点云可以实现准确且高效的无姿态(pose-free)重建。然而,现有的仅基于RGB的方法在涉及动态物体和极端光照条件的真实世界环境中表现不佳,这是由于传统相机的固有限制所致。 为此,在本文中我们提出了EAG3R,这是一种新型几何估计框架,它通过异步事件流增强点云基础的重建技术。基于MonST3R骨干网络,EAG3R引入了两项关键创新:(1)一个灵感来自Retinex理论的图像增强模块和一个轻量级事件适配器,该适配器具有信噪比感知融合机制,可以自适应地结合RGB和事件特征以提升局部可靠性;(2)一种新的基于事件的时间空间一致性损失函数,它在全局优化过程中加强了时域和空域的一致性。我们的方法能够在没有夜间数据重新训练的情况下,在挑战性的动态低光场景中实现稳健的几何估计。 大量的实验表明,EAG3R在单目深度估计、相机姿态跟踪以及动态重建任务上显著优于现有的仅基于RGB的方法,并且是当前最先进的基准方法。
https://arxiv.org/abs/2512.00771
The rapid growth of Artificial Intelligence-Generated Content (AIGC) raises concerns about the authenticity of digital media. In this context, image self-recovery, reconstructing original content from its manipulated version, offers a practical solution for understanding the attacker's intent and restoring trustworthy data. However, existing methods often fail to accurately recover tampered regions, falling short of the primary goal of self-recovery. To address this challenge, we propose ReImage, a neural watermarking-based self-recovery framework that embeds a shuffled version of the target image into itself as a watermark. We design a generator that produces watermarks optimized for neural watermarking and introduce an image enhancement module to refine the recovered image. We further analyze and resolve key limitations of shuffled watermarking, enabling its effective use in self-recovery. We demonstrate that ReImage achieves state-of-the-art performance across diverse tampering scenarios, consistently producing high-quality recovered images. The code and pretrained models will be released upon publication.
https://arxiv.org/abs/2511.22936
In the field of autonomous driving, camera-based perception models are mostly trained on clear weather data. Models that focus on addressing specific weather challenges are unable to adapt to various weather changes and primarily prioritize their weather removal characteristics. Our study introduces a semantic-enabled network for object detection in diverse weather conditions. In our analysis, semantics information can enable the model to generate plausible content for missing areas, understand object boundaries, and preserve visual coherency and realism across both filled-in and existing portions of the image, which are conducive to image transformation and object recognition. Specific in implementation, our architecture consists of a Preprocessing Unit (PPU) and a Detection Unit (DTU), where the PPU utilizes a U-shaped net enriched by semantics to refine degraded images, and the DTU integrates this semantic information for object detection using a modified YOLO network. Our method pioneers the use of semantic data for all-weather transformations, resulting in an increase between 1.47\% to 8.80\% in mAP compared to existing methods across benchmark datasets of different weather. This highlights the potency of semantics in image enhancement and object detection, offering a comprehensive approach to improving object detection performance. Code will be available at this https URL.
https://arxiv.org/abs/2511.22142
Low-light image enhancement is an essential computer vision task to improve image contrast and to decrease the effects of color bias and noise. Many existing interpretable deep-learning algorithms exploit the Retinex theory as the basis of model design. However, previous Retinex-based algorithms, that consider reflected objects as ideal Lambertian ignore specular reflection in the modeling process and construct the physical constraints in image space, limiting generalization of the model. To address this issue, we preserve the specular reflection coefficient and reformulate the original physical constraints in the imaging process based on the Kubelka-Munk theory, thereby constructing constraint relationship between illumination, reflection, and detection, the so-called triple physical constraints (TPCs)theory. Based on this theory, the physical constraints are constructed in the feature space of the model to obtain the TPC network (TPCNet). Comprehensive quantitative and qualitative benchmark and ablation experiments confirm that these constraints effectively improve the performance metrics and visual quality without introducing new parameters, and demonstrate that our TPCNet outperforms other state-of-the-art methods on 10 datasets.
https://arxiv.org/abs/2511.22052
Traffic signboards are vital for road safety and intelligent transportation systems, enabling navigation and autonomous driving. Yet, recognizing traffic signs at night remains challenging due to visual noise and scarcity of public nighttime datasets. Despite advances in vision architectures, existing methods struggle with robustness under low illumination and fail to leverage complementary mutlimodal cues effectively. To overcome these limitations, firstly, we introduce INTSD, a large-scale dataset comprising street-level night-time images of traffic signboards collected across diverse regions of India. The dataset spans 41 traffic signboard classes captured under varying lighting and weather conditions, providing a comprehensive benchmark for both detection and classification tasks. To benchmark INTSD for night-time sign recognition, we conduct extensive evaluations using state-of-the-art detection and classification models. Secondly, we propose LENS-Net, which integrates an adaptive image enhancement detector for joint illumination correction and sign localization, followed by a structured multimodal CLIP-GCNN classifier that leverages cross-modal attention and graph-based reasoning for robust and semantically consistent recognition. Our method surpasses existing frameworks, with ablation studies confirming the effectiveness of its key components. The dataset and code for LENS-Net is publicly available for research.
https://arxiv.org/abs/2511.17183
Underwater Image Enhancement (UIE) aims to restore visibility and correct color distortions caused by wavelength-dependent absorption and scattering. Recent hybrid approaches, which couple domain priors with modern deep neural architectures, have achieved strong performance but incur high computational cost, limiting their practicality in real-time scenarios. In this work, we propose WWE-UIE, a compact and efficient enhancement network that integrates three interpretable priors. First, adaptive white balance alleviates the strong wavelength-dependent color attenuation, particularly the dominance of blue-green tones. Second, a wavelet-based enhancement block (WEB) performs multi-band decomposition, enabling the network to capture both global structures and fine textures, which are critical for underwater restoration. Third, a gradient-aware module (SGFB) leverages Sobel operators with learnable gating to explicitly preserve edge structures degraded by scattering. Extensive experiments on benchmark datasets demonstrate that WWE-UIE achieves competitive restoration quality with substantially fewer parameters and FLOPs, enabling real-time inference on resource-limited platforms. Ablation studies and visualizations further validate the contribution of each component. The source code is available at this https URL.
https://arxiv.org/abs/2511.16321
Imaging in low-light environments is challenging due to reduced scene radiance, which leads to elevated sensor noise and reduced color saturation. Most learning-based low-light enhancement methods rely on paired training data captured under a single low-light condition and a well-lit reference. The lack of radiance diversity limits our understanding of how enhancement techniques perform across varying illumination intensities. We introduce the Multi-Illumination Low-Light (MILL) dataset, containing images captured at diverse light intensities under controlled conditions with fixed camera settings and precise illuminance measurements. MILL enables comprehensive evaluation of enhancement algorithms across variable lighting conditions. We benchmark several state-of-the-art methods and reveal significant performance variations across intensity levels. Leveraging the unique multi-illumination structure of our dataset, we propose improvements that enhance robustness across diverse illumination scenarios. Our modifications achieve up to 10 dB PSNR improvement for DSLR and 2 dB for the smartphone on Full HD images.
https://arxiv.org/abs/2511.15496
Underwater image restoration and enhancement are crucial for correcting color distortion and restoring image details, thereby establishing a fundamental basis for subsequent underwater visual tasks. However, current deep learning methodologies in this area are frequently constrained by the scarcity of high-quality paired datasets. Since it is difficult to obtain pristine reference labels in underwater scenes, existing benchmarks often rely on manually selected results from enhancement algorithms, providing debatable reference images that lack globally consistent color and authentic supervision. This limits the model's capabilities in color restoration, image enhancement, and generalization. To overcome this limitation, we propose using in-air natural images as unambiguous reference targets and translating them into underwater-degraded versions, thereby constructing synthetic datasets that provide authentic supervision signals for model learning. Specifically, we establish a generative data framework based on unpaired image-to-image translation, producing a large-scale dataset that covers 6 representative underwater degradation types. The framework constructs synthetic datasets with precise ground-truth labels, which facilitate the learning of an accurate mapping from degraded underwater images to their pristine scene appearances. Extensive quantitative and qualitative experiments across 6 representative network architectures and 3 independent test sets show that models trained on our synthetic data achieve comparable or superior color restoration and generalization performance to those trained on existing benchmarks. This research provides a reliable and scalable data-driven solution for underwater image restoration and enhancement. The generated dataset is publicly available at: this https URL.
https://arxiv.org/abs/2511.14521
Enhancing low-light traffic images is crucial for reliable perception in autonomous driving, intelligent transportation, and urban surveillance systems. Nighttime and dimly lit traffic scenes often suffer from poor visibility due to low illumination, noise, motion blur, non-uniform lighting, and glare from vehicle headlights or street lamps, which hinder tasks such as object detection and scene understanding. To address these challenges, we propose a fully unsupervised multi-stage deep learning framework for low-light traffic image enhancement. The model decomposes images into illumination and reflectance components, progressively refined by three specialized modules: (1) Illumination Adaptation, for global and local brightness correction; (2) Reflectance Restoration, for noise suppression and structural detail recovery using spatial-channel attention; and (3) Over-Exposure Compensation, for reconstructing saturated regions and balancing scene luminance. The network is trained using self-supervised reconstruction, reflectance smoothness, perceptual consistency, and domain-aware regularization losses, eliminating the need for paired ground-truth images. Experiments on general and traffic-specific datasets demonstrate superior performance over state-of-the-art methods in both quantitative metrics (PSNR, SSIM, LPIPS, NIQE) and qualitative visual quality. Our approach enhances visibility, preserves structure, and improves downstream perception reliability in real-world low-light traffic scenarios.
https://arxiv.org/abs/2511.17612
Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.
https://arxiv.org/abs/2511.13607
Medical image enhancement is clinically valuable, but existing methods require large-scale datasets to learn complex pixel-level mappings. However, the substantial training and storage costs associated with these datasets hinder their practical deployment. While dataset distillation (DD) can alleviate these burdens, existing methods mainly target high-level tasks, where multiple samples share the same label. This many-to-one mapping allows distilled data to capture shared semantics and achieve information compression. In contrast, low-level tasks involve a many-to-many mapping that requires pixel-level fidelity, making low-level DD an underdetermined problem, as a small distilled dataset cannot fully constrain the dense pixel-level mappings. To address this, we propose the first low-level DD method for medical image enhancement. We first leverage anatomical similarities across patients to construct the shared anatomical prior based on a representative patient, which serves as the initialization for the distilled data of different patients. This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module, which integrates patient-specific anatomical information into the distilled dataset while preserving pixel-level fidelity. For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs. Patient-specific knowledge is injected into the distilled data by aligning the gradients computed from networks trained on the distilled pairs with those from the corresponding patient's raw data. Notably, downstream users cannot access raw patient data. Instead, only a distilled dataset containing abstract training information is shared, which excludes patient-specific details and thus preserves privacy.
https://arxiv.org/abs/2511.13106
Ultra-low-field (ULF) MRI promises broader accessibility but suffers from low signal-to-noise ratio (SNR), reduced spatial resolution, and contrasts that deviate from high-field standards. Imageto- image translation can map ULF images to a high-field appearance, yet efficacy is limited by scarce paired training data. Working within the ULF-EnC challenge constraints (50 paired 3D volumes; no external data), we study how task-adapted data augmentations impact a standard deep model for ULF image enhancement. We show that strong, diverse augmentations, including auxiliary tasks on high-field data, substantially improve fidelity. Our submission ranked third by brain-masked SSIM on the public validation leaderboard and fourth by the official score on the final test leaderboard. Code is available at this https URL.
https://arxiv.org/abs/2511.09366