Ultra-low-field (ULF) MRI promises broader accessibility but suffers from low signal-to-noise ratio (SNR), reduced spatial resolution, and contrasts that deviate from high-field standards. Imageto- image translation can map ULF images to a high-field appearance, yet efficacy is limited by scarce paired training data. Working within the ULF-EnC challenge constraints (50 paired 3D volumes; no external data), we study how task-adapted data augmentations impact a standard deep model for ULF image enhancement. We show that strong, diverse augmentations, including auxiliary tasks on high-field data, substantially improve fidelity. Our submission ranked third by brain-masked SSIM on the public validation leaderboard and fourth by the official score on the final test leaderboard. Code is available at this https URL.
https://arxiv.org/abs/2511.09366
We present IllumFlow, a novel framework that synergizes conditional Rectified Flow (CRF) with Retinex theory for low-light image enhancement (LLIE). Our model addresses low-light enhancement through separate optimization of illumination and reflectance components, effectively handling both lighting variations and noise. Specifically, we first decompose an input image into reflectance and illumination components following Retinex theory. To model the wide dynamic range of illumination variations in low-light images, we propose a conditional rectified flow framework that represents illumination changes as a continuous flow field. While complex noise primarily resides in the reflectance component, we introduce a denoising network, enhanced by flow-derived data augmentation, to remove reflectance noise and chromatic aberration while preserving color fidelity. IllumFlow enables precise illumination adaptation across lighting conditions while naturally supporting customizable brightness enhancement. Extensive experiments on low-light enhancement and exposure correction demonstrate superior quantitative and qualitative performance over existing methods.
https://arxiv.org/abs/2511.02411
Low-light image enhancement (LLIE) faces persistent challenges in balancing reconstruction fidelity with cross-scenario generalization. While existing methods predominantly focus on deterministic pixel-level mappings between paired low/normal-light images, they often neglect the continuous physical process of luminance transitions in real-world environments, leading to performance drop when normal-light references are unavailable. Inspired by empirical analysis of natural luminance dynamics revealing power-law distributed intensity transitions, this paper introduces Luminance-Aware Statistical Quantification (LASQ), a novel framework that reformulates LLIE as a statistical sampling process over hierarchical luminance distributions. Our LASQ re-conceptualizes luminance transition as a power-law distribution in intensity coordinate space that can be approximated by stratified power functions, therefore, replacing deterministic mappings with probabilistic sampling over continuous luminance layers. A diffusion forward process is designed to autonomously discover optimal transition paths between luminance layers, achieving unsupervised distribution emulation without normal-light references. In this way, it considerably improves the performance in practical situations, enabling more adaptable and versatile light restoration. This framework is also readily applicable to cases with normal-light references, where it achieves superior performance on domain-specific datasets alongside better generalization-ability across non-reference datasets.
https://arxiv.org/abs/2511.01510
Medical image super-resolution (SR) is essential for enhancing diagnostic accuracy while reducing acquisition cost and scanning time. However, modeling both long-range anatomical structures and fine-grained frequency details with low computational overhead remains challenging. We propose FGMamba, a novel frequency-aware gated state-space model that unifies global dependency modeling and fine-detail enhancement into a lightweight architecture. Our method introduces two key innovations: a Gated Attention-enhanced State-Space Module (GASM) that integrates efficient state-space modeling with dual-branch spatial and channel attention, and a Pyramid Frequency Fusion Module (PFFM) that captures high-frequency details across multiple resolutions via FFT-guided fusion. Extensive evaluations across five medical imaging modalities (Ultrasound, OCT, MRI, CT, and Endoscopic) demonstrate that FGMamba achieves superior PSNR/SSIM while maintaining a compact parameter footprint ($<$0.75M), outperforming CNN-based and Transformer-based SOTAs. Our results validate the effectiveness of frequency-aware state-space modeling for scalable and accurate medical image enhancement.
https://arxiv.org/abs/2510.27296
We propose an innovative enhancement to the Mamba framework by increasing the Hausdorff dimension of its scanning pattern through a novel Hilbert Selective Scan mechanism. This mechanism explores the feature space more effectively, capturing intricate fine-scale details and improving overall coverage. As a result, it mitigates information inconsistencies while refining spatial locality to better capture subtle local interactions without sacrificing the model's ability to handle long-range dependencies. Extensive experiments on publicly available benchmarks demonstrate that our approach significantly improves both the quantitative metrics and qualitative visual fidelity of existing Mamba-based low-light image enhancement methods, all while reducing computational resource consumption and shortening inference time. We believe that this refined strategy not only advances the state-of-the-art in low-light image enhancement but also holds promise for broader applications in fields that leverage Mamba-based techniques.
https://arxiv.org/abs/2510.26001
Digital image processing involves the systematic handling of images using advanced computer algorithms, and has gained significant attention in both academic and practical fields. Image enhancement is a crucial preprocessing stage in the image-processing chain, improving image quality and emphasizing features. This makes subsequent tasks (segmentation, feature extraction, classification) more reliable. Image enhancement is essential for rice leaf analysis, aiding in disease detection, nutrient deficiency evaluation, and growth analysis. Denoising followed by contrast enhancement are the primary steps. Image filters, generally employed for denoising, transform or enhance visual characteristics like brightness, contrast, and sharpness, playing a crucial role in improving overall image quality and enabling the extraction of useful information. This work provides an extensive comparative study of well-known image-denoising methods combined with CLAHE (Contrast Limited Adaptive Histogram Equalization) for efficient denoising of rice leaf images. The experiments were performed on a rice leaf image dataset to ensure the data is relevant and representative. Results were examined using various metrics to comprehensively test enhancement methods. This approach provides a strong basis for assessing the effectiveness of methodologies in digital image processing and reveals insights useful for future adaptation in agricultural research and other domains.
https://arxiv.org/abs/2511.00046
Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination. With the advent of deep learning, the LLIE technique has achieved significant breakthroughs. However, existing LLIE methods either ignore the important role of frequency domain information or fail to effectively promote the propagation and flow of information, limiting the LLIE performance. In this paper, we develop a novel frequency-spatial interaction-driven network (FSIDNet) for LLIE based on two-stage architecture. To be specific, the first stage is designed to restore the amplitude of low-light images to improve the lightness, and the second stage devotes to restore phase information to refine fine-grained structures. Considering that Frequency domain and spatial domain information are complementary and both favorable for LLIE, we further develop two frequency-spatial interaction blocks which mutually amalgamate the complementary spatial and frequency information to enhance the capability of the model. In addition, we construct the Information Exchange Module (IEM) to associate two stages by adequately incorporating cross-stage and cross-scale features to effectively promote the propagation and flow of information in the two-stage network structure. Finally, we conduct experiments on several widely used benchmark datasets (i.e., LOL-Real, LSRW-Huawei, etc.), which demonstrate that our method achieves the excellent performance in terms of visual results and quantitative metrics while preserving good model efficiency.
https://arxiv.org/abs/2510.22154
Current generative super-resolution methods show strong performance on natural images but distort text, creating a fundamental trade-off between image quality and textual readability. To address this, we introduce \textbf{TIGER} (\textbf{T}ext-\textbf{I}mage \textbf{G}uided sup\textbf{E}r-\textbf{R}esolution), a novel two-stage framework that breaks this trade-off through a \textit{"text-first, image-later"} paradigm. \textbf{TIGER} explicitly decouples glyph restoration from image enhancement: it first reconstructs precise text structures and then uses them to guide subsequent full-image super-resolution. This glyph-to-image guidance ensures both high fidelity and visual consistency. To support comprehensive training and evaluation, we also contribute the \textbf{UltraZoom-ST} (UltraZoom-Scene Text), the first scene text dataset with extreme zoom (\textbf{$\times$14.29}). Extensive experiments show that \textbf{TIGER} achieves \textbf{state-of-the-art} performance, enhancing readability while preserving overall image quality.
https://arxiv.org/abs/2510.21590
Retinex-based low-light image enhancement methods are widely used due to their excellent performance. However, most of them are time-consuming for large-sized images. This paper extends the Retinex model from the spatial domain to the histogram domain, and proposes a novel histogram-based Retinex model for fast low-light image enhancement, named HistRetinex. Firstly, we define the histogram location matrix and the histogram count matrix, which establish the relationship among histograms of the illumination, reflectance and the low-light image. Secondly, based on the prior information and the histogram-based Retinex model, we construct a novel two-level optimization model. Through solving the optimization model, we give the iterative formulas of the illumination histogram and the reflectance histogram, respectively. Finally, we enhance the low-light image through matching its histogram with the one provided by HistRetinex. Experimental results demonstrate that the HistRetinex outperforms existing enhancement methods in both visibility and performance metrics, while executing 1.86 seconds on 1000*664 resolution images, achieving a minimum time saving of 6.67 seconds.
https://arxiv.org/abs/2510.21100
Accurate 6D pose estimation is essential for robotic manipulation in industrial environments. Existing pipelines typically rely on off-the-shelf object detectors followed by cropping and pose refinement, but their performance degrades under challenging conditions such as clutter, poor lighting, and complex backgrounds, making detection the critical bottleneck. In this work, we introduce a standardized and plug-in pipeline for 2D detection of unseen objects in industrial settings. Based on current SOTA baselines, our approach reduces domain shift and background artifacts through low-light image enhancement and background removal guided by open-vocabulary detection with foundation models. This design suppresses the false positives prevalent in raw SAM outputs, yielding more reliable detections for downstream pose estimation. Extensive experiments on real-world industrial bin-picking benchmarks from BOP demonstrate that our method significantly boosts detection accuracy while incurring negligible inference overhead, showing the effectiveness and practicality of the proposed method.
https://arxiv.org/abs/2510.21000
Consumer-grade camera systems often struggle to maintain stable image quality under complex illumination conditions such as low light, high dynamic range, and backlighting, as well as spatial color temperature variation. These issues lead to underexposure, color casts, and tonal inconsistency, which degrade the performance of downstream vision tasks. To address this, we propose ACamera-Net, a lightweight and scene-adaptive camera parameter adjustment network that directly predicts optimal exposure and white balance from RAW inputs. The framework consists of two modules: ACamera-Exposure, which estimates ISO to alleviate underexposure and contrast loss, and ACamera-Color, which predicts correlated color temperature and gain factors for improved color consistency. Optimized for real-time inference on edge devices, ACamera-Net can be seamlessly integrated into imaging pipelines. Trained on diverse real-world data with annotated references, the model generalizes well across lighting conditions. Extensive experiments demonstrate that ACamera-Net consistently enhances image quality and stabilizes perception outputs, outperforming conventional auto modes and lightweight baselines without relying on additional image enhancement modules.
消费者级相机系统在复杂光照条件下(如低光、高动态范围和逆光)以及空间色温变化的情况下,往往难以维持稳定的图像质量。这些问题导致了曝光不足、色调偏差和色彩不一致,从而降低了下游视觉任务的表现。为了解决这一问题,我们提出了ACamera-Net,这是一种轻量级且场景适应性强的相机参数调整网络,可以直接从RAW输入预测最佳的曝光度和白平衡设置。该框架由两个模块组成:ACamera-Exposure用于估计ISO值以减轻曝光不足和对比度损失;ACamera-Color则预测相关色温及增益因子,从而提升色彩一致性。针对边缘设备上的实时推理进行了优化,ACamera-Net可以无缝集成到成像流水线中。通过使用带有注释参考的多样化真实世界数据进行训练,该模型在各种光照条件下表现良好且具有泛化能力。广泛的实验表明,与传统自动模式和轻量级基线相比,ACamera-Net在无需额外图像增强模块的情况下持续改善了图像质量和稳定了感知输出,并表现出色。
https://arxiv.org/abs/2510.20550
Diffusion-based methods, leveraging pre-trained large models like Stable Diffusion via ControlNet, have achieved remarkable performance in several low-level vision tasks. However, Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism. This issue is exacerbated in low-light scenarios, where severely degraded information caused by the darkness limits effective control. We identify two primary causes of fidelity loss: the absence of suitable conditional latent modeling and the lack of bidirectional interaction between the conditional latent and noisy latent in the diffusion process. To address this, we propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics. Our method introduces a mechanism to recover spatial details lost during VAE encoding, i.e., a latent refinement pipeline incorporating generative priors. Additionally, the refined latent condition interacts dynamically with the noisy latent, leading to improved restoration performance. Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control. Extensive experiments demonstrate significant fidelity improvements in PTDB methods.
基于扩散的方法,通过利用如ControlNet中的预训练大型模型Stable Diffusion,在若干低级视觉任务中取得了显著的性能。然而,预训练的扩散基方法(PTDB)常常为了达到更高的感知真实感而牺牲内容保真度。这一问题在低光场景中尤为严重,由于黑暗导致的信息严重退化限制了有效控制。 我们确定了两个主要的内容损失原因:缺乏合适的条件潜在模型以及条件潜在与噪声潜在之间缺乏双向交互作用。为解决这些问题,我们提出了一种新的预训练扩散模型的条件优化策略,在保持真实感和美学的同时提高保真度。我们的方法引入了一个机制来恢复VAE编码过程中丢失的空间细节,即一个包含生成先验的潜在精炼管道。此外,经过精细处理后的潜在条件与噪声潜在进行动态交互,从而提升了修复性能。 本方法采用插件式的集成方式,可以无缝地整合到现有的扩散网络中以提供更有效的控制能力。广泛的实验表明,在预训练扩散模型方法中实现了显著的内容保真度提升。
https://arxiv.org/abs/2510.17105
Low-light RAW image enhancement remains a challenging task. Although numerous deep learning based approaches have been proposed, they still suffer from inherent limitations. A key challenge is how to simultaneously achieve strong enhancement quality and high efficiency. In this paper, we rethink the architecture for efficient low-light image signal processing (ISP) and introduce a Hierarchical Mixing Architecture (HiMA). HiMA leverages the complementary strengths of Transformer and Mamba modules to handle features at large and small scales, respectively, thereby improving efficiency while avoiding the ambiguities observed in prior two-stage frameworks. To further address uneven illumination with strong local variations, we propose Local Distribution Adjustment (LoDA), which adaptively aligns feature distributions across different local regions. In addition, to fully exploit the denoised outputs from the first stage, we design a Multi-prior Fusion (MPF) module that integrates spatial and frequency-domain priors for detail enhancement. Extensive experiments on multiple public datasets demonstrate that our method outperforms state-of-the-art approaches, achieving superior performance with fewer parameters. Code will be released at this https URL.
低光RAW图像增强仍然是一个具有挑战性的任务。尽管已经提出了许多基于深度学习的方法,但它们仍然存在固有的局限性。主要的挑战是如何同时实现强大的增强质量和高效率。在本文中,我们重新思考了高效的低光图像信号处理(ISP)架构,并引入了一种分层混合架构(HiMA)。HiMA利用Transformer和Mamba模块的优势来分别处理大尺度和小尺度特征,从而提高效率并避免先前两阶段框架中存在的模糊性。为了解决强烈的局部变化引起的光照不均问题,我们提出了局部分布调整(LoDA),它能够自适应地对不同局部区域的特性分布进行校准。此外,为了充分利用第一阶段去噪后的输出,我们设计了一个多先验融合(MPF)模块,该模块集成了空间和频域先验以增强细节。在多个公共数据集上的广泛实验表明,我们的方法超越了现有的最先进技术,在参数更少的情况下实现了更好的性能。代码将在[提供链接]发布。
https://arxiv.org/abs/2510.15497
Low-light image enhancement (LLIE) aims to improve illumination while preserving high-quality color and texture. However, existing methods often fail to extract reliable feature representations due to severely degraded pixel-level information under low-light conditions, resulting in poor texture restoration, color inconsistency, and artifact. To address these challenges, we propose LightQANet, a novel framework that introduces quantized and adaptive feature learning for low-light enhancement, aiming to achieve consistent and robust image quality across diverse lighting conditions. From the static modeling perspective, we design a Light Quantization Module (LQM) to explicitly extract and quantify illumination-related factors from image features. By enforcing structured light factor learning, LQM enhances the extraction of light-invariant representations and mitigates feature inconsistency across varying illumination levels. From the dynamic adaptation perspective, we introduce a Light-Aware Prompt Module (LAPM), which encodes illumination priors into learnable prompts to dynamically guide the feature learning process. LAPM enables the model to flexibly adapt to complex and continuously changing lighting conditions, further improving image enhancement. Extensive experiments on multiple low-light datasets demonstrate that our method achieves state-of-the-art performance, delivering superior qualitative and quantitative results across various challenging lighting scenarios.
低光图像增强(LLIE)的目标是提升照明效果,同时保持高质量的颜色和纹理。然而,现有的方法往往由于在低光照条件下像素级信息严重退化而无法提取可靠的特征表示,导致较差的纹理恢复、颜色不一致以及伪影问题。为了应对这些挑战,我们提出了LightQANet,这是一种新颖的框架,引入了量化和自适应特性学习以改进低光图像增强,旨在实现跨多种照明条件的一致性和鲁棒性图像质量。 从静态建模的角度来看,我们设计了一个光照量化模块(LQM),用于明确地提取并量化与光照相关的因素。通过强制执行结构化光照因子学习,LQM增强了对光照不变表示的提取,并减少了在不同光照水平下的特征不一致性。 从动态适应性的角度来看,我们引入了一种光照感知提示模块(LAPM),它将光照先验编码为可学习的提示,以动态引导特性学习过程。通过这种方式,LAPM使得模型能够灵活地适应复杂且不断变化的照明条件,进一步提升图像增强效果。 在多个低光数据集上的广泛实验表明,我们的方法达到了最先进的性能,在各种具有挑战性的照明场景中提供了优越的质量和定量结果。
https://arxiv.org/abs/2510.14753
Detecting camouflaged objects in underwater environments is crucial for marine ecological research and resource exploration. However, existing methods face two key challenges: underwater image degradation, including low contrast and color distortion, and the natural camouflage of marine organisms. Traditional image enhancement techniques struggle to restore critical features in degraded images, while camouflaged object detection (COD) methods developed for terrestrial scenes often fail to adapt to underwater environments due to the lack of consideration for underwater optical characteristics. To address these issues, we propose APGNet, an Adaptive Prior-Guided Network, which integrates a Siamese architecture with a novel prior-guided mechanism to enhance robustness and detection accuracy. First, we employ the Multi-Scale Retinex with Color Restoration (MSRCR) algorithm for data augmentation, generating illumination-invariant images to mitigate degradation effects. Second, we design an Extended Receptive Field (ERF) module combined with a Multi-Scale Progressive Decoder (MPD) to capture multi-scale contextual information and refine feature representations. Furthermore, we propose an adaptive prior-guided mechanism that hierarchically fuses position and boundary priors by embedding spatial attention in high-level features for coarse localization and using deformable convolution to refine contours in low-level features. Extensive experimental results on two public MAS datasets demonstrate that our proposed method APGNet outperforms 15 state-of-art methods under widely used evaluation metrics.
在水下环境中检测伪装物体对于海洋生态研究和资源勘探至关重要。然而,现有方法面临着两个主要挑战:水中图像退化(包括低对比度和颜色失真)以及海洋生物的天然伪装。传统图像增强技术难以恢复受损图像中的关键特征,而为陆地场景开发的伪装物体检测(COD) 方法由于缺乏对水下光学特性的考虑,往往无法适应水下环境。为了应对这些问题,我们提出了一种自适应先验引导网络(APGNet),它集成了孪生架构与新颖的先验指导机制来提高鲁棒性和检测精度。 首先,我们采用多尺度Retinex彩色恢复(MSRCR)算法进行数据增强,生成照明不变图像以减轻退化效应。其次,我们设计了一个扩展感受野(ERF)模块结合了多尺度渐进解码器(MPD),用于捕捉多尺度上下文信息并细化特征表示。此外,我们还提出了一种自适应先验引导机制,通过在高级别特征中嵌入空间注意来进行位置和边界先验的层次融合,进行粗定位,并使用可变形卷积来精细低级别特征中的轮廓。 在两个公开的MAS数据集上的大量实验结果表明,在广泛使用的评估指标下,我们提出的方法APGNet优于15种最先进的方法。
https://arxiv.org/abs/2510.12056
Photo enhancement plays a crucial role in augmenting the visual aesthetics of a photograph. In recent years, photo enhancement methods have either focused on enhancement performance, producing powerful models that cannot be deployed on edge devices, or prioritized computational efficiency, resulting in inadequate performance for real-world applications. To this end, this paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction. This approach enables fast processing of high-resolution images while also achieving excellent performance. Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images, while incorporating two distinct weight fusion strategies to achieve coarse global image enhancement. To implement this strategy, we designed a spatial-frequency transformer weight predictor that effectively extracts the desired distinct weights by leveraging frequency features. Additionally, we apply local Laplacian filters to adaptively refine edge details in high-frequency components. After meticulously redesigning the network structure and transformer model, LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces runtime, with 4K resolution images processed in just 13 ms on a single GPU. Extensive experimental results on two benchmark datasets further show that the proposed approach performs favorably compared to state-of-the-art methods. The source code will be made publicly available at this https URL.
照片增强在提升摄影作品的视觉美感方面扮演着关键角色。近年来,照片增强方法要么侧重于提高性能,导致生成的强大模型无法部署到边缘设备上;要么优先考虑计算效率,导致实际应用中的表现不佳。为此,本文提出了一种名为LLF-LUT++的金字塔网络,通过封闭形式的拉普拉斯金字塔分解和重构,整合了全局和局部操作器。这种方法能够在处理高分辨率图像时实现快速运算的同时,也达到了优异的表现效果。 具体而言,我们利用了一个基于下采样图像整体色调特性的自适应3D查找表(LUT),并通过两种不同的权重融合策略实现了粗略的全局图像增强。为了实施这一策略,我们设计了一种空间频率变换器权重预测器,通过利用频率特性有效地提取所需的独特权重。此外,我们还应用了局部拉普拉斯滤波器来自适应地精炼高频分量中的边缘细节。 经过对网络结构和变换模型的精心重构后,LLF-LUT++不仅在HDR+数据集上实现了2.64 dB的峰值信噪比(PSNR)提升,而且进一步减少了运行时间,在单个GPU上处理4K分辨率图像仅需13毫秒。通过两个基准数据集上的广泛实验结果表明,所提出的方法与最新方法相比表现出色。 源代码将在以下网址公开发布:[此链接](https://this-url.com/)(注意替换实际的URL)。
https://arxiv.org/abs/2510.11613
Computer vision and image processing applications suffer from dark and low-light images, particularly during real-time image transmission. Currently, low light and dark images are converted to bright and colored forms using autoencoders; however, these methods often achieve low SSIM and PSNR scores and require high computational power due to their large number of parameters. To address these challenges, the DeepFusionNet architecture has been developed. According to the results obtained with the LOL-v1 dataset, DeepFusionNet achieved an SSIM of 92.8% and a PSNR score of 26.30, while containing only approximately 2.5 million parameters. On the other hand, conversion of blurry and low-resolution images into high-resolution and blur-free images has gained importance in image processing applications. Unlike GAN-based super-resolution methods, an autoencoder-based super resolution model has been developed that contains approximately 100 thousand parameters and uses the DeepFusionNet architecture. According to the results of the tests, the DeepFusionNet based super-resolution method achieved a PSNR of 25.30 and a SSIM score of 80.7 percent according to the validation set.
https://arxiv.org/abs/2510.10122
Underwater images play a crucial role in ocean research and marine environmental monitoring since they provide quality information about the ecosystem. However, the complex and remote nature of the environment results in poor image quality with issues such as low visibility, blurry textures, color distortion, and noise. In recent years, research in image enhancement has proven to be effective but also presents its own limitations, like poor generalization and heavy reliance on clean datasets. One of the challenges herein is the lack of diversity and the low quality of images included in these datasets. Also, most existing datasets consist only of monocular images, a fact that limits the representation of different lighting conditions and angles. In this paper, we propose a new plan of action to overcome these limitations. On one hand, we call for expanding the datasets using a denoising diffusion model to include a variety of image types such as stereo, wide-angled, macro, and close-up images. On the other hand, we recommend enhancing the images using Controlnet to evaluate and increase the quality of the corresponding datasets, and hence improve the study of the marine ecosystem. Tags - Underwater Images, Denoising Diffusion, Marine ecosystem, Controlnet
水下图像在海洋研究和海洋环境监测中扮演着重要角色,因为它们提供了关于生态系统质量的信息。然而,由于复杂的海底环境特点,所获得的图像通常质量较差,存在能见度低、模糊纹理、色彩失真和噪声等问题。近年来,有关图像增强的研究已经证明是有效的,但同时也带来了自身的局限性,比如泛化能力差以及对高质量数据集的高度依赖。其中一个挑战在于这些数据集中缺少多样性和包含的图像质量较低的问题。此外,大多数现有的数据集仅包括单目图像,这限制了不同光照条件和角度下的表现力。 在本文中,我们提出了一种新的策略来克服上述局限性。一方面,建议利用去噪扩散模型扩展数据集,并纳入多种类型的图像,例如立体图像、广角图像、微距图像以及特写图像等。另一方面,则推荐使用Controlnet增强这些图像的质量以评估和提高相应数据集的整体质量,从而提升对海洋生态系统的研究。 标签 - 水下图像, 去噪扩散模型, 海洋生态系统, Controlnet
https://arxiv.org/abs/2510.09934
We engage in the relatively underexplored task named thermal infrared image enhancement. Existing infrared image enhancement methods primarily focus on tackling individual degradations, such as noise, contrast, and blurring, making it difficult to handle coupled degradations. Meanwhile, all-in-one enhancement methods, commonly applied to RGB sensors, often demonstrate limited effectiveness due to the significant differences in imaging models. In sight of this, we first revisit the imaging mechanism and introduce a Progressive Prompt Fusion Network (PPFN). Specifically, the PPFN initially establishes prompt pairs based on the thermal imaging process. For each type of degradation, we fuse the corresponding prompt pairs to modulate the model's features, providing adaptive guidance that enables the model to better address specific degradations under single or multiple conditions. In addition, a Selective Progressive Training (SPT) mechanism is introduced to gradually refine the model's handling of composite cases to align the enhancement process, which not only allows the model to remove camera noise and retain key structural details, but also enhancing the overall contrast of the thermal image. Furthermore, we introduce the most high-quality, multi-scenarios infrared benchmark covering a wide range of scenarios. Extensive experiments substantiate that our approach not only delivers promising visual results under specific degradation but also significantly improves performance on complex degradation scenes, achieving a notable 8.76\% improvement. Code is available at this https URL.
我们专注于一个相对未被充分探索的任务,即热红外图像增强。现有的红外图像增强方法主要集中在解决单一的退化问题(如噪声、对比度和模糊),这使得处理耦合的退化变得困难。同时,通常应用于RGB传感器的一体化增强方法由于成像模型的巨大差异,在热红外成像中往往效果有限。 针对这些问题,我们首先重新审视了成像机制,并引入了一种渐进式提示融合网络(Progressive Prompt Fusion Network, PPFN)。具体而言,PPFN最初基于热成像过程建立了提示对。对于每一种退化类型,我们将相应的提示对进行融合,以调节模型的特征,从而提供自适应指导,使模型能够在单一或多种条件下降更好地应对特定退化问题。 此外,我们还引入了一种选择性渐进式训练(Selective Progressive Training, SPT)机制,逐步细化模型处理复杂情况的能力,使其增强过程更加贴合实际。这一方法不仅能使模型去除相机噪声并保留关键结构细节,还能提升热图像的整体对比度。 另外,为了验证我们的方法的有效性和通用性,我们创建了最高质量、多场景的红外基准数据集,涵盖了广泛的应用场景。 广泛的实验表明,相较于传统方法,我们的方法在处理特定退化问题时提供了视觉效果更好的结果,并且在面对复杂退化场景时表现显著提升,取得了8.76%的性能改善。代码可在以下链接获取:[此链接](请将"[此链接]"替换为实际提供的GitHub或其他代码托管平台的链接)。
https://arxiv.org/abs/2510.09343
We introduce a simple and efficient method to enhance and clarify images. More specifically, we deal with low light image enhancement and clarification of hazy imagery (hazy/foggy images, images containing sand dust, and underwater images). Our method involves constructing an image filter to simulate low-light or hazy conditions and deriving approximate reverse filters to minimize distortions in the enhanced images. Experimental results show that our approach is highly competitive and often surpasses state-of-the-art techniques in handling extremely dark images and in enhancing hazy images. A key advantage of our approach lies in its simplicity: Our method is implementable with just a few lines of MATLAB code.
我们介绍了一种简单且高效的图像增强和清晰化方法。具体来说,该方法针对低光环境下的图像增强以及雾天、沙尘天气及水下等模糊场景中的图像清晰化问题进行了研究。我们的方法包括构建一个图像滤镜来模拟低光照或有雾的条件,并推导出近似的逆向滤镜以最小化在增强后的图像中产生的失真。实验结果表明,与现有的先进技术相比,该方法在处理极其黑暗的图像以及提升模糊图像质量方面表现出了极强的竞争性且往往更胜一筹。我们方法的一大优势在于其简洁性:仅需几行MATLAB代码即可实现我们的算法。
https://arxiv.org/abs/2510.08358