In recent years, learning-based underwater image enhancement (UIE) techniques have rapidly evolved. However, distribution shifts between high-quality enhanced outputs and natural images can hinder semantic cue extraction for downstream vision tasks, thereby limiting the adaptability of existing enhancement models. To address this challenge, this work proposes a new learning mechanism that leverages Vision-Language Models (VLMs) to empower UIE models with semantic-sensitive capabilities. To be concrete, our strategy first generates textual descriptions of key objects from a degraded image via VLMs. Subsequently, a text-image alignment model remaps these relevant descriptions back onto the image to produce a spatial semantic guidance map. This map then steers the UIE network through a dual-guidance mechanism, which combines cross-attention and an explicit alignment loss. This forces the network to focus its restorative power on semantic-sensitive regions during image reconstruction, rather than pursuing a globally uniform improvement, thereby ensuring the faithful restoration of key object features. Experiments confirm that when our strategy is applied to different UIE baselines, significantly boosts their performance on perceptual quality metrics as well as enhances their performance on detection and segmentation tasks, validating its effectiveness and adaptability.
https://arxiv.org/abs/2603.12773
RGB-NIR image registration plays an important role in sensor-fusion, image enhancement and off-road autonomy. In this work, we evaluate both classical and Deep Learning (DL) based image registration techniques to access their suitability for off-road forestry applications. NeMAR, trained under 6 different configurations, demonstrates partial success however, its GAN loss instability suggests challenges in preserving geometric consistency. MURF, when tested on off-road forestry data shows promising large scale feature alignment during shared information extraction but struggles with fine details in dense vegetation. Even though this is just a preliminary evaluation, our study necessitates further refinements for robust, multi-scale registration for off-road forest applications.
RGB-NIR(红绿蓝-近红外)图像配准在传感器融合、图像增强和非公路自主导航中扮演着重要角色。在这项工作中,我们评估了传统的基于深度学习(DL)的图像配准技术,以考察它们是否适用于非公路林业应用。NeMAR 在六种不同配置下训练后显示出部分成功,然而其生成对抗网络(GAN)损失不稳定的问题表明在保持几何一致性方面存在挑战。当 MURF 被测试用于非公路林业数据时,在共享信息提取期间显示出了大规模特征对齐的前景,但在密集植被中的细节处理上遇到了困难。尽管这只是初步评估,我们的研究仍需要进一步改进以实现适用于非公路森林应用的强大、多尺度配准技术。
https://arxiv.org/abs/2603.11952
Most sRGB-based LLIE methods suffer from entangled luminance and color, while the HSV color space offers insufficient decoupling at the cost of introducing significant red and black noise artifacts. Recently, the HVI color space has been proposed to address these limitations by enhancing color fidelity through chrominance polarization and intensity compression. However, existing methods could suffer from channel-level inconsistency between luminance and chrominance, and misaligned color distribution may lead to unnatural enhancement results. To address these challenges, we propose the Variance-Driven Channel Recalibration for Robust Low-Light Enhancement (VCR), a novel framework for low-light image enhancement. VCR consists of two main components, including the Channel Adaptive Adjustment (CAA) module, which employs variance-guided feature filtering to enhance the model's focus on regions with high intensity and color distribution. And the Color Distribution Alignment (CDA) module, which enforces distribution alignment in the color feature space. These designs enhance perceptual quality under low-light conditions. Experimental results on several benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance compared with existing methods.
大多数基于sRGB的低光图像增强(LLIE)方法都存在亮度和色彩相互纠缠的问题,而HSV颜色空间虽然在一定程度上可以分离这两者,但会引入显著的红色和黑色噪声。最近提出了HVI颜色空间来解决这些问题,通过色度极化和强度压缩提高色彩保真度。然而,现有方法可能会因亮度通道与色度通道之间的一致性问题而受到影响,导致不自然的增强效果。为了解决这些挑战,我们提出了一种新的低光图像增强框架——基于方差驱动通道校准的稳健低光增强(VCR)。该框架包含两个主要组件:通道自适应调整(CAA)模块和颜色分布对齐(CDA)模块。CAA模块采用方差引导的特征过滤来提升模型在高亮度区域与色彩分布上的关注;而CDA模块则确保颜色特征空间中的分布一致性。这些设计提高了低光条件下的感知质量。实验结果表明,在多个基准数据集上,所提出的方法相较于现有方法达到了最先进的性能水平。
https://arxiv.org/abs/2603.10975
Underwater images often suffer from severe degradation caused by light absorption and scattering, leading to color distortion, low contrast and reduced visibility. Existing Underwater Image Enhancement (UIE) methods can be divided into two categories, i.e., prior-based and learning-based methods. The former rely on rigid physical assumptions that limit the adaptability, while the latter often face data scarcity and weak generalization. To address these issues, we propose a Physics-Semantics-Guided Underwater Image Enhancement Network (PSG-UIENet), which couples the Retinex-grounded illumination correction with the language-informed guidance. This network comprises a Prior-Free Illumination Estimator, a Cross-Modal Text Aligner and a Semantics-Guided Image Restorer. In particular, the restorer leverages the textual descriptions generated by the Contrastive Language-Image Pre-training (CLIP) model to inject high-level semantics for perceptually meaningful guidance. Since multimodal UIE data sets are not publicly available, we also construct a large-scale image-text UIE data set, namely, LUIQD-TD, which contains 6,418 image-reference-text triplets. To explicitly measure and optimize semantic consistency between textual descriptions and images, we further design an Image-Text Semantic Similarity (ITSS) loss function. To our knowledge, this study makes the first effort to introduce both textual guidance and the multimodal data set into UIE tasks. Extensive experiments on our data set and four publicly available data sets demonstrate that the proposed PSG-UIENet achieves superior or comparable performance against fifteen state-of-the-art methods.
水下图像往往因光吸收和散射而遭受严重退化,导致色彩失真、对比度低以及能见度降低。现有的水下图像增强(UIE)方法可以分为基于先验知识的方法和基于学习的方法两大类。前者依赖于刚性的物理假设,限制了其适应性;后者则通常面临数据稀缺及泛化能力弱的问题。为了解决这些问题,我们提出了一种由物理-语义引导的水下图像增强网络(PSG-UIENet),该网络结合了基于Retinex原理的光照校正与语言信息指导。这个网络包括一个无先验假设的光照估计器、一个多模态文本对齐器以及一种受语义指导的图像恢复器。特别是,恢复器利用由对比性语言图像预训练(CLIP)模型生成的文字描述来注入高层次的语义信息,提供感知上有意义的引导。由于多模态UIE数据集不公开可用,我们也构建了一个大规模的图文UIE数据集,称为LUIQD-TD,包含6,418个图像-参考-文本三元组。为了明确地测量和优化文字描述与图像之间的语义一致性,我们进一步设计了一种图像-文本语义相似性(ITSS)损失函数。据我们所知,这项研究首次将文本指导和多模态数据集引入到UIE任务中。在我们的数据集以及四个公开可用的数据集上进行的大量实验表明,提出的PSG-UIENet与现有的十五种最先进的方法相比,在性能上表现出优越或相当的表现。
https://arxiv.org/abs/2603.07076
Event cameras, with their high dynamic range, show great promise for Low-light Image Enhancement (LLIE). Existing works primarily focus on designing effective modal fusion strategies. However, a key challenge is the dual degradation from intrinsic background activity (BA) noise in events and low signal-to-noise ratio (SNR) in images, which causes severe noise coupling during modal fusion, creating a critical performance bottleneck. We therefore posit that precise event denoising is the prerequisite to unlocking the full potential of event-based fusion. To this end, we propose BiEvLight, a hierarchical and task-aware framework that collaboratively optimizes enhancement and denoising by exploiting their intrinsic interdependence. Specifically, BiEvLight exploits the strong gradient correlation between images and events to build a gradient-guided event denoising prior that alleviates insufficient denoising in heavily noisy regions. Moreover, instead of treating event denoising as a static pre-processing stage-which inevitably incurs a trade-off between over- and under-denoising and cannot adapt to the requirements of a specific enhancement objective-we recast it as a bilevel optimization problem constrained by the enhancement task. Through cross-task interaction, the upper-level denoising problem learns event representations tailored to the lower-level enhancement objective, thereby substantially improving overall enhancement quality. Extensive experiments on the Real-world noise Dataset SDE demonstrate that our method significantly outperforms state-of-the-art (SOTA) approaches, with average improvements of 1.30dB in PSNR, 2.03dB in PSNR* and 0.047 in SSIM, respectively. The code will be publicly available at this https URL.
事件相机因其高动态范围,在低光图像增强(LLIE)方面展现出巨大潜力。现有研究主要集中在设计有效的模态融合策略上,然而一个关键挑战是事件中存在的固有背景活动(BA)噪声和图像中的低信噪比(SNR),这导致在模态融合过程中产生严重的噪声耦合现象,形成性能瓶颈的关键问题。因此,我们认为精确的事件去噪是解锁基于事件融合潜力的前提条件。为此,我们提出了BiEvLight,这是一个分层且任务感知的框架,通过利用其内在相关性来协同优化图像增强和去噪过程。 具体来说,BiEvLight 利用图像与事件之间的强梯度关联构建了一个引导式事件去噪先验模型,从而缓解了在高噪声区域中因去噪不足而产生的问题。此外,我们没有将事件去噪视为一个静态预处理阶段——这不可避免地会导致过度去噪和欠去噪之间的权衡,并且无法适应特定增强目标的需求——而是将其重新构造成一个受增强任务约束的双层优化问题。通过跨任务交互,上层去噪问题学习到了针对下层增强目标所需的事件表示形式,从而大幅提高了整体增强质量。 在真实噪声数据集SDE上的广泛实验表明,我们的方法显著优于现有最先进的(SOTA)方法,在PSNR、PSNR*和SSIM方面分别平均提升了1.30dB、2.03dB 和 0.047。代码将公开发布在此链接:[https://github.com/your-repo-url](https://github.com/your-repo-url)(请确保替换为实际的GitHub存储库URL)。
https://arxiv.org/abs/2603.04975
In real underwater environments, downstream image recognition tasks such as semantic segmentation and object detection often face challenges posed by problems like blurring and color inconsistencies. Underwater image enhancement (UIE) has emerged as a promising preprocessing approach, aiming to improve the recognizability of targets in underwater images. However, most existing UIE methods mainly focus on enhancing images for human visual perception, frequently failing to reconstruct high-frequency details that are critical for task-specific recognition. To address this issue, we propose a Downstream Task-Inspired Underwater Image Enhancement (DTI-UIE) framework, which leverages human visual perception model to enhance images effectively for underwater vision tasks. Specifically, we design an efficient two-branch network with task-aware attention module for feature mixing. The network benefits from a multi-stage training framework and a task-driven perceptual loss. Additionally, inspired by human perception, we automatically construct a Task-Inspired UIE Dataset (TI-UIED) using various task-specific networks. Experimental results demonstrate that DTI-UIE significantly improves task performance by generating preprocessed images that are beneficial for downstream tasks such as semantic segmentation, object detection, and instance segmentation. The codes are publicly available at this https URL.
在真实的水下环境中,下游图像识别任务(如语义分割和目标检测)常常会遇到诸如模糊和颜色不一致等问题所带来的挑战。水下图像增强(UIE)作为一种前景预处理方法应运而生,旨在提高水下图像中目标的可辨识性。然而,大多数现有的UIE方法主要集中在改善人类视觉感知效果上,通常无法恢复对特定任务识别至关重要的高频细节信息。为了解决这一问题,我们提出了一种受下游任务启发的水下图像增强(DTI-UIE)框架,该框架利用了人类视觉感知模型来有效提高水下视觉任务中的图像质量。具体而言,我们设计了一个高效的双分支网络,并配备了具备任务意识注意模块的功能混合结构。此外,该网络受益于一个多阶段训练框架以及一种由任务驱动的感知损失函数。另外,受人类感知启发,我们利用各种特定任务的网络自动构建了“任务启发式水下图像增强数据集”(TI-UIED)。实验结果表明,DTI-UIE通过生成有利于下游任务如语义分割、目标检测和实例分割等任务处理的预处理图像,显著提升了这些任务的表现。相关代码可以在以下链接获得:[此URL]。
https://arxiv.org/abs/2603.01767
We propose a simple yet effective UHDPromer, a neural discrimination-prompted Transformer, for Ultra-High-Definition (UHD) image restoration and enhancement. Our UHDPromer is inspired by an interesting observation that there implicitly exist neural differences between high-resolution and low-resolution features, and exploring such differences can facilitate low-resolution feature representation. To this end, we first introduce Neural Discrimination Priors (NDP) to measure the differences and then integrate NDP into the proposed Neural Discrimination-Prompted Attention (NDPA) and Neural Discrimination-Prompted Network (NDPN). The proposed NDPA re-formulates the attention by incorporating NDP to globally perceive useful discrimination information, while the NDPN explores a continuous gating mechanism guided by NDP to selectively permit the passage of beneficial content. To enhance the quality of restored images, we propose a super-resolution-guided reconstruction approach, which is guided by super-resolving low-resolution features to facilitate final UHD image restoration. Experiments show that UHDPromer achieves the best computational efficiency while still maintaining state-of-the-art performance on $3$ UHD image restoration and enhancement tasks, including low-light image enhancement, image dehazing, and image deblurring. The source codes and pre-trained models will be made available at this https URL.
我们提出了一种简单而有效的UHDPromer,这是一种神经区分提示Transformer,用于超高清(UHD)图像的恢复和增强。我们的UHDPromer受到一个有趣观察的启发,即高分辨率特征与低分辨率特征之间隐含地存在神经差异,并且探索这些差异可以促进低分辨率特征的表现形式。为此,我们首先引入了神经区分先验(NDP),用于衡量这些差异,然后将NDP整合到提出的神经区分提示注意机制(NDPA)和神经区分提示网络(NDPN)中。所提出的NDPA通过融合NDP重新定义注意力机制,以全局感知有用的区分信息,而NDPN则探索一种由NDP指导的连续门控机制,以便选择性地允许有益内容通过。为了提升恢复图像的质量,我们提出了一种超分辨率引导重建方法,该方法受低分辨率特征超解析过程的引导,从而促进最终的UHD图像恢复。 实验结果表明,UHDPromer在计算效率上表现出最佳水平,并且仍能在三个UHD图像恢复和增强任务(包括低光图像增强、去雾化和去除模糊)中保持最先进的性能。源代码及预训练模型可在上述网址获取。
https://arxiv.org/abs/2603.00853
Low-field magnetic resonance imaging (MRI) provides affordable access to diagnostic imaging but suffers from prolonged acquisition and limited image quality. Accelerated imaging can be achieved with k-space undersampling, while super-resolution (SR) and image quality transfer (IQT) methods typically rely on spatial-domain post-processing. In this work, we propose a novel framework for reconstructing high-field MR like images directly from undersampled low-field k-space. Our approach employs a k-space dual channel U-Net that processes the real and imaginary components of undersampled k-space to restore missing frequency content. Experiments on low-field brain MRI demonstrate that our k-space-driven image enhancement consistently outperforms the counterpart spatial-domain method. Furthermore, reconstructions from undersampled k-space achieve image quality comparable to full k-space acquisitions. To the best of our knowledge, this is the first work that investigates low-field MRI SR/IQT directly from undersampled k-space.
低场磁共振成像(MRI)提供了经济实惠的诊断影像访问途径,但存在采集时间长和图像质量有限的问题。通过k空间欠采样可以实现加速成像,而超分辨率(SR)和图像质量转换(IQT)方法通常依赖于空间域后的处理步骤。在本研究中,我们提出了一种新的框架,直接从欠采样的低场k空间重构高场MR类似的图像。我们的方法采用了一个k空间双通道U-Net架构,该架构能够处理欠采样k空间的实部和虚部组件,以恢复缺失的频率内容。 实验结果表明,在低场脑MRI数据上的应用中,我们提出的基于k空间的图像增强方法在性能上显著优于传统的空间域方法。此外,从欠采样的k空间重构出的图像质量可以与完全采样的k空间采集得到的质量相媲美。据我们所知,这是第一个直接从欠采样k空间研究低场MRI超分辨率和图像质量转换的工作。
https://arxiv.org/abs/2603.00668
Low-light images often suffer from low contrast, noise, and color distortion, degrading visual quality and impairing downstream vision tasks. We propose a novel conditional diffusion framework for low-light image enhancement that incorporates a Structured Control Embedding Module (SCEM). SCEM decomposes a low-light image into four informative components including illumination, illumination-invariant features, shadow priors, and color-invariant cues. These components serve as control signals that condition a U-Net-based diffusion model trained with a simplified noise-prediction loss. Thus, the proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors. In experiments, our model is trained only on the LOLv1 dataset and evaluated without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME. The method achieves state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks. this https URL.
低光图像常常会因为对比度低、噪声多和色彩失真等问题而降低视觉质量,影响下游的视觉任务。我们提出了一种新的条件扩散框架来增强低光图像,并在此框架中引入了一个结构化控制嵌入模块(SCEM)。SCEM 将一幅低光图像分解成四个信息量丰富的组成部分:照明、非光照不变特征、阴影先验和色彩不变线索。这些部分作为控制信号,用于调节一个基于 U-Net 的扩散模型的训练过程,该模型使用简化的噪声预测损失进行训练。因此,所提出的 SCEM 配备的方法通过物理先验指导来执行结构化增强。 在实验中,我们的模型仅在 LOLv1 数据集上进行了训练,并且在没有微调的情况下对 LOLv2-real、LSRW、DICM、MEF 和 LIME 进行了评估。该方法在定量和感知指标上都取得了最先进的性能,在不同基准测试中的泛化能力表现出色。 原文链接:[请在此处插入实际的URL链接,上述描述并未提供具体的链接地址]
https://arxiv.org/abs/2603.00337
We present LoR-LUT, a unified low-rank formulation for compact and interpretable 3D lookup table (LUT) generation. Unlike conventional 3D-LUT-based techniques that rely on fusion of basis LUTs, which are usually dense tensors, our unified approach extends the current framework by jointly using residual corrections, which are in fact low-rank tensors, together with a set of basis LUTs. The approach described here improves the existing perceptual quality of an image, which is primarily due to the technique's novel use of residual corrections. At the same time, we achieve the same level of trilinear interpolation complexity, using a significantly smaller number of network, residual corrections, and LUT parameters. The experimental results obtained from LoR-LUT, which is trained on the MIT-Adobe FiveK dataset, reproduce expert-level retouching characteristics with high perceptual fidelity and a sub-megabyte model size. Furthermore, we introduce an interactive visualization tool, termed LoR-LUT Viewer, which transforms an input image into the LUT-adjusted output image, via a number of slidebars that control different parameters. The tool provides an effective way to enhance interpretability and user confidence in the visual results. Overall, our proposed formulation offers a compact, interpretable, and efficient direction for future LUT-based image enhancement and style transfer.
我们提出了LoR-LUT,这是一种用于生成紧凑且可解释的3D查找表(LUT)的统一低秩表示方法。与传统的基于3D-LUT的方法不同,后者依赖于基LUT的融合,而这些基LUT通常是密集张量,我们的统一方法通过同时使用残差校正来扩展当前框架,这些残差实际上是低秩张量,并与一组基LUT结合使用。这里描述的方法主要由于其对残差校正的新颖应用,提高了图像现有的感知质量。与此同时,我们利用了显著较少的网络、残差校正和LUT参数,在保持相同的三线性插值复杂度的情况下实现了这一目标。 在MIT-Adobe FiveK数据集上训练得到的LoR-LUT模型能够以不到1MB的大小高度逼真地复现专家级图像修饰特性,具有良好的感知保真度。此外,我们还引入了一个交互式可视化工具,名为LoR-LUT Viewer,通过调整多个滑块来控制不同参数,该工具可以将输入图片转换为经过LUT校准后的输出图片。此工具提供了一种有效的方法以增强视觉结果的可解释性和用户信心。 总体而言,我们的方法提供了未来基于LUT的图像增强和风格迁移的一种紧凑、可解释且高效的途径。
https://arxiv.org/abs/2602.22607
Robot perception under low light or high dynamic range is usually improved downstream - via more robust feature extraction, image enhancement, or closed-loop exposure control. However, all of these approaches are limited by the image captured these conditions. An alternate approach is to utilize a programmable onboard light that adds to ambient illumination and improves captured images. However, it is not straightforward to predict its impact on image formation. Illumination interacts nonlinearly with depth, surface reflectance, and scene geometry. It can both reveal structure and induce failure modes such as specular highlights and saturation. We introduce Lightning, a closed-loop illumination-control framework for visual SLAM that combines relighting, offline optimization, and imitation learning. This is performed in three stages. First, we train a Co-Located Illumination Decomposition (CLID) relighting model that decomposes a robot observation into an ambient component and a light-contribution field. CLID enables physically consistent synthesis of the same scene under alternative light intensities and thereby creates dense multi-intensity training data without requiring us to repeatedly re-run trajectories. Second, using these synthesized candidates, we formulate an offline Optimal Intensity Schedule (OIS) problem that selects illumination levels over a sequence trading off SLAM-relevant image utility against power consumption and temporal smoothness. Third, we distill this ideal solution into a real-time controller through behavior cloning, producing an Illumination Control Policy (ILC) that generalizes beyond the initial training distribution and runs online on a mobile robot to command discrete light-intensity levels. Across our evaluation, Lightning substantially improves SLAM trajectory robustness while reducing unnecessary illumination power.
https://arxiv.org/abs/2602.15900
Underwater Image Enhancement (UIE) is an ill-posed problem where natural clean references are not available, and the degradation levels vary significantly across semantic regions. Existing UIE methods treat images with a single global model and ignore the inconsistent degradation of different scene components. This oversight leads to significant color distortions and loss of fine details in heterogeneous underwater scenes, especially where degradation varies significantly across different image regions. Therefore, we propose SUCode (Semantic-aware Underwater Codebook Network), which achieves adaptive UIE from semantic-aware discrete codebook representation. Compared with one-shot codebook-based methods, SUCode exploits semantic-aware, pixel-level codebook representation tailored to heterogeneous underwater degradation. A three-stage training paradigm is employed to represent raw underwater image features to avoid pseudo ground-truth contamination. Gated Channel Attention Module (GCAM) and Frequency-Aware Feature Fusion (FAFF) jointly integrate channel and frequency cues for faithful color restoration and texture recovery. Extensive experiments on multiple benchmarks demonstrate that SUCode achieves state-of-the-art performance, outperforming recent UIE methods on both reference and no-reference metrics. The code will be made public available at this https URL.
水下图像增强(UIE)是一个不适定问题,因为自然干净的参考图无法获取,并且退化程度在语义区域间变化显著。现有的UIE方法采用单一全局模型处理图像,忽略了不同场景组件之间不一致的退化情况。这种忽视会导致在异构水下场景中出现明显的色彩扭曲和细节丢失,尤其是在图像的不同区域退化差异较大的情况下更为严重。因此,我们提出了SUCode(语义感知水下载入网络),它利用语义感知离散载入表示实现自适应UIE。与一次生成的载入方法不同,SUCode采用针对异构水下退化的像素级语义载入表示。 为了防止伪真实标签污染,在训练过程中采用了三阶段训练范式以表征原始水下图像特征。门控通道注意模块(GCAM)和频率感知特性融合(FAFF)共同集成了信道和频率线索,用于忠实的色彩恢复和纹理重建。在多个基准测试中的广泛实验表明,SUCode取得了最先进的性能,在参考和无参考指标上均优于最近的UIE方法。 代码将在以下网址公开发布:[此链接] (请根据实际情况提供具体URL)
https://arxiv.org/abs/2602.10586
Thermal infrared sensors, with wavelengths longer than smoke particles, can capture imagery independent of darkness, dust, and smoke. This robustness has made them increasingly valuable for motion estimation and environmental perception in robotics, particularly in adverse conditions. Existing thermal odometry and mapping approaches, however, are predominantly geometric and often fail across diverse datasets while lacking the ability to produce dense maps. Motivated by the efficiency and high-quality reconstruction ability of recent Gaussian Splatting (GS) techniques, we propose TOM-GS, a thermal odometry and mapping method that integrates learning-based odometry with GS-based dense mapping. TOM-GS is among the first GS-based SLAM systems tailored for thermal cameras, featuring dedicated thermal image enhancement and monocular depth integration. Extensive experiments on motion estimation and novel-view rendering demonstrate that TOM-GS outperforms existing learning-based methods, confirming the benefits of learning-based pipelines for robust thermal odometry and dense reconstruction.
热红外传感器,其波长比烟雾颗粒长,能够捕捉不受黑暗、灰尘和烟雾影响的图像。这种鲁棒性使它们在机器人运动估计和环境感知中变得越来越有价值,尤其是在恶劣条件下。然而,现有的热里程计和映射方法主要是几何学上的,并且经常在不同的数据集中失效,同时缺乏生成密集地图的能力。受最近高斯点阵(GS)技术的效率和高质量重建能力启发,我们提出了TOM-GS,这是一种将基于学习的里程计与基于GS的密集映射集成在一起的热里程计和映射方法。TOM-GS是首个专为热相机设计并采用GS为基础的SLAM系统的之一,它具有专门针对热图像增强及单目深度整合的功能。在运动估计和新视角渲染方面的广泛实验表明,TOM-GS优于现有的基于学习的方法,确认了基于学习的工作流程对于稳健的热里程计和密集重建的好处。
https://arxiv.org/abs/2602.07493
Muon Scattering Tomography (MST) is a promising non-invasive inspection technique, yet the practical application of short-time MST is hindered by poor image quality due to limited muon flux. To address this limitation, we propose a U-Net-based framework trained on Point of Closest Approach (PoCA) images reconstructed with simulation MST data to enhance image quality. When applied to experimental MST data, the framework significantly improves image quality, increasing the Structural Similarity Index Measure (SSIM) from 0.7232 to 0.9699 and decreasing the Learned Perceptual Image Patch Similarity (LPIPS) from 0.3604 to 0.0270. These results demonstrate that our method can effectively enhance low-statistics MST images, thereby paving the way for the practical deployment of short-time MST.
缪子散射断层成像(MST)是一种有前景的非侵入性检测技术,然而由于缪子通量有限导致图像质量不佳,短时间内的实际应用受到了限制。为了解决这一局限,我们提出了一种基于U-Net框架的方法,该方法使用模拟MST数据重建的最接近点成像(PoCA)进行训练,以提升图像质量。当应用于实验性的MST数据时,此框架显著改善了图像的质量,将结构相似性指数测量(SSIM)从0.7232提高到0.9699,并且将学习感知图像块相似度(LPIPS)从0.3604降低至0.0270。这些结果表明我们的方法能够有效提升低统计量的MST图像的质量,从而为短时间MST的实际应用铺平了道路。
https://arxiv.org/abs/2602.07060
Underwater photography presents significant inherent challenges including reduced contrast, spatial blur, and wavelength-dependent color distortions. These effects can obscure the vibrancy of marine life and awareness photographers in particular are often challenged with heavy post-processing pipelines to correct for these distortions. We develop an image-to-image pipeline that learns to reverse underwater degradations by introducing a synthetic corruption pipeline and learning to reverse its effects with diffusion-based generation. Training and evaluation are performed on a small high-quality dataset of awareness photography images by Keith Ellenbogen. The proposed methodology achieves high perceptual consistency and strong generalization in synthesizing 512x768 images using a model of ~11M parameters after training from scratch on ~2.5k images.
水下摄影面临诸多固有挑战,包括对比度降低、空间模糊以及依赖波长的颜色失真。这些效果会遮蔽海洋生物的鲜艳色彩,并且摄影师常常需要面对繁重的后期处理流程来纠正这些问题。我们开发了一种图像到图像的流水线,通过引入合成腐蚀流水线并学习用基于扩散的方法逆转其影响,从而学会消除水下退化现象。训练和评估是在Keith Ellenbogen的一组高质量意识摄影的小型数据集上进行的。所提出的方法在使用约1100万个参数模型从头开始训练大约2500张图像后,在合成分辨率为512x768像素的图像时,达到了高度感知一致性和强大的泛化能力。
https://arxiv.org/abs/2602.05163
Underwater images suffer severe degradation due to wavelength-dependent attenuation, scattering, and illumination non-uniformity that vary across water types and depths. We propose an unsupervised Domain-Invariant Visual Enhancement and Restoration (DIVER) framework that integrates empirical correction with physics-guided modeling for robust underwater image enhancement. DIVER first applies either IlluminateNet for adaptive luminance enhancement or a Spectral Equalization Filter for spectral normalization. An Adaptive Optical Correction Module then refines hue and contrast using channel-adaptive filtering, while Hydro-OpticNet employs physics-constrained learning to compensate for backscatter and wavelength-dependent attenuation. The parameters of IlluminateNet and Hydro-OpticNet are optimized via unsupervised learning using a composite loss function. DIVER is evaluated on eight diverse datasets covering shallow, deep, and highly turbid environments, including both naturally low-light and artificially illuminated scenes, using reference and non-reference metrics. While state-of-the-art methods such as WaterNet, UDNet, and Phaseformer perform reasonably in shallow water, their performance degrades in deep, unevenly illuminated, or artificially lit conditions. In contrast, DIVER consistently achieves best or near-best performance across all datasets, demonstrating strong domain-invariant capability. DIVER yields at least a 9% improvement over SOTA methods in UCIQE. On the low-light SeaThru dataset, where color-palette references enable direct evaluation of color restoration, DIVER achieves at least a 4.9% reduction in GPMAE compared to existing methods. Beyond visual quality, DIVER also improves robotic perception by enhancing ORB-based keypoint repeatability and matching performance, confirming its robustness across diverse underwater environments.
https://arxiv.org/abs/2601.22878
Ultrasound (US) interpretation is hampered by multiplicative speckle, acquisition blur from the point-spread function (PSF), and scanner- and operator-dependent artifacts. Supervised enhancement methods assume access to clean targets or known degradations; conditions rarely met in practice. We present a blind, self-supervised enhancement framework that jointly deconvolves and denoises B-mode images using a Swin Convolutional U-Net trained with a \emph{physics-guided} degradation model. From each training frame, we extract rotated/cropped patches and synthesize inputs by (i) convolving with a Gaussian PSF surrogate and (ii) injecting noise via either spatial additive Gaussian noise or complex Fourier-domain perturbations that emulate phase/magnitude distortions. For US scans, clean-like targets are obtained via non-local low-rank (NLLR) denoising, removing the need for ground truth; for natural images, the originals serve as targets. Trained and validated on UDIAT~B, JNU-IFM, and XPIE Set-P, and evaluated additionally on a 700-image PSFHS test set, the method achieves the highest PSNR/SSIM across Gaussian and speckle noise levels, with margins that widen under stronger corruption. Relative to MSANN, Restormer, and DnCNN, it typically preserves an extra $\sim$1--4\,dB PSNR and 0.05--0.15 SSIM in heavy Gaussian noise, and $\sim$2--5\,dB PSNR and 0.05--0.20 SSIM under severe speckle. Controlled PSF studies show reduced FWHM and higher peak gradients, evidence of resolution recovery without edge erosion. Used as a plug-and-play preprocessor, it consistently boosts Dice for fetal head and pubic symphysis segmentation. Overall, the approach offers a practical, assumption-light path to robust US enhancement that generalizes across datasets, scanners, and degradation types.
https://arxiv.org/abs/2601.21856
The demand for accurate on-device pattern recognition in edge applications is intensifying, yet existing approaches struggle to reconcile accuracy with computational constraints. To address this challenge, a resource-aware hierarchical network based on multi-spectral fusion and interpretable modules, namely the Hierarchical Parallel Pseudo-image Enhancement Fusion Network (HPPI-Net), is proposed for real-time, on-device Human Activity Recognition (HAR). Deployed on an ARM Cortex-M4 microcontroller for low-power real-time inference, HPPI-Net achieves 96.70% accuracy while utilizing only 22.3 KiB of RAM and 439.5 KiB of ROM after optimization. HPPI-Net employs a two-layer architecture. The first layer extracts preliminary features using Fast Fourier Transform (FFT) spectrograms, while the second layer selectively activates either a dedicated module for stationary activity recognition or a parallel LSTM-MobileNet network (PLMN) for dynamic states. PLMN fuses FFT, Wavelet, and Gabor spectrograms through three parallel LSTM encoders and refines the concatenated features using Efficient Channel Attention (ECA) and Depthwise Separable Convolution (DSC), thereby offering channel-level interpretability while substantially reducing multiply-accumulate operations. Compared with MobileNetV3, HPPI-Net improves accuracy by 1.22% and reduces RAM usage by 71.2% and ROM usage by 42.1%. These results demonstrate that HPPI-Net achieves a favorable accuracy-efficiency trade-off and provides explainable predictions, establishing a practical solution for wearable, industrial, and smart home HAR on memory-constrained edge platforms.
https://arxiv.org/abs/2602.00152
Low-light image super-resolution (LLSR) is a challenging task due to the coupled degradation of low resolution and poor illumination. To address this, we propose the Guided Texture and Feature Modulation Network (GTFMN), a novel framework that decouples the LLSR task into two sub-problems: illumination estimation and texture restoration. First, our network employs a dedicated Illumination Stream whose purpose is to predict a spatially varying illumination map that accurately captures lighting distribution. Further, this map is utilized as an explicit guide within our novel Illumination Guided Modulation Block (IGM Block) to dynamically modulate features in the Texture Stream. This mechanism achieves spatially adaptive restoration, enabling the network to intensify enhancement in poorly lit regions while preserving details in well-exposed areas. Extensive experiments demonstrate that GTFMN achieves the best performance among competing methods on the OmniNormal5 and OmniNormal15 datasets, outperforming them in both quantitative metrics and visual quality.
https://arxiv.org/abs/2601.19157
In the current era of mobile internet, Lightweight Low-Light Image Enhancement (L3IE) is critical for mobile devices, which faces a persistent trade-off between visual quality and model compactness. While recent methods employ disentangling strategies to simplify lightweight architectural design, such as Retinex theory and YUV color space transformations, their performance is fundamentally limited by overlooking channel-specific degradation patterns and cross-channel interactions. To address this gap, we perform a frequency-domain analysis that confirms the superiority of the YUV color space for L3IE. We identify a key insight: the Y channel primarily loses low-frequency content, while the UV channels are corrupted by high-frequency noise. Leveraging this finding, we propose a novel YUV-based paradigm that strategically restores channels using a Dual-Stream Global-Local Attention module for the Y channel, a Y-guided Local-Aware Frequency Attention module for the UV channels, and a Guided Interaction module for final feature fusion. Extensive experiments validate that our model establishes a new state-of-the-art on multiple benchmarks, delivering superior visual quality with a significantly lower parameter count.
https://arxiv.org/abs/2601.17349