Paper Reading AI Learner

RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement

2025-05-02 12:21:44
Kui Jiang, Yan Luo, Junjun Jiang, Xin Xu, Fei Ma, Fei Yu

Abstract

Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications, where wavelength-dependent attenuation causes severe content degradation and color distortion. While recent state space models like Mamba show potential for long-range dependency modeling, their unfolding operations and fixed scan paths on 1D sequences fail to adapt to local object semantics and global relation modeling, limiting their efficacy in complex underwater environments. To address this, we enhance conventional Mamba with the sorting-based scanning mechanism that dynamically reorders scanning sequences based on statistical distribution of spatial correlation of all pixels. In this way, it encourages the network to prioritize the most informative components--structural and semantic features. Upon building this mechanism, we devise a Visually Self-adaptive State Block (VSSB) that harmonizes dynamic sorting of Mamba with input-dependent dynamic convolution, enabling coherent integration of global context and local relational cues. This exquisite design helps eliminate global focus bias, especially for widely distributed contents, which greatly weakens the statistical frequency. For robust feature extraction and refinement, we design a cross-feature bridge (CFB) to adaptively fuse multi-scale representations. These efforts compose the novel relation-driven Mamba framework for effective UIE (RD-UIE). Extensive experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba in both quantitative metrics and visual fidelity, averagely achieving 0.55 dB performance gain on the three benchmarks. Our code is available at this https URL

Abstract (translated)

水下图像增强(UIE)是海洋视觉应用中的一个关键预处理步骤,其中波长依赖的衰减会导致严重的内容退化和颜色失真。尽管最近的状态空间模型如Mamba在长期依赖性建模方面显示出潜力,但它们的操作展开过程以及固定的一维序列扫描路径无法适应局部对象语义及全局关系建模的需求,在复杂的水下环境中其有效性受到限制。为解决这一问题,我们通过基于排序的扫描机制对传统的Mamba进行了增强,该机制可根据所有像素的空间相关性统计分布动态重新排列扫描顺序。这样一来,它鼓励网络优先处理最具信息量的组成部分——结构和语义特征。在此基础上,我们设计了一种视觉自适应状态块(VSSB),将Mamba的动态排序与基于输入的动态卷积相结合,从而实现了全局上下文与局部关系线索的一致融合。这一精妙的设计有助于消除全局关注偏差,特别是对于广泛分布的内容而言,这大大削弱了统计频率。为了实现稳健的功能提取和细化,我们设计了一种跨特征桥(CFB),以自适应地融合多尺度表示。这些努力共同构成了用于有效UIE的新颖的关系驱动Mamba框架(RD-UIE)。在水下增强基准测试上的大量实验表明,RD-UIE在定量指标和视觉保真度方面均优于当前最先进的方法WMamba,在三个基准上平均实现了0.55 dB的性能提升。我们的代码可在提供的链接处获取。

URL

https://arxiv.org/abs/2505.01224

PDF

https://arxiv.org/pdf/2505.01224.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot