Paper Reading AI Learner

Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement

2025-06-05 08:39:17
Niki Martinel, Rita Pucci

Abstract

We present a novel dual-stream architecture that achieves state-of-the-art underwater image enhancement by explicitly integrating the Jaffe-McGlamery physical model with capsule clustering-based feature representation learning. Our method simultaneously estimates transmission maps and spatially-varying background light through a dedicated physics estimator while extracting entity-level features via capsule clustering in a parallel stream. This physics-guided approach enables parameter-free enhancement that respects underwater formation constraints while preserving semantic structures and fine-grained details. Our approach also features a novel optimization objective ensuring both physical adherence and perceptual quality across multiple spatial frequencies. To validate our approach, we conducted extensive experiments across six challenging benchmarks. Results demonstrate consistent improvements of $+0.5$dB PSNR over the best existing methods while requiring only one-third of their computational complexity (FLOPs), or alternatively, more than $+1$dB PSNR improvement when compared to methods with similar computational budgets. Code and data \textit{will} be available at this https URL.

Abstract (translated)

我们提出了一种新颖的双流架构,通过明确整合Jaffe-McGlamery物理模型与基于胶囊聚类的特征表示学习,实现了最先进的水下图像增强。我们的方法在专用物理估计器中同时估算传输图和空间变化的背景光,而在并行流中则通过胶囊聚类提取实体级特征。这种物理引导的方法能够在尊重水下成像约束的同时保持语义结构和细微细节,并且无需参数调整即可实现增强效果。此外,我们的方法还引入了一种新颖的优化目标,确保在整个多频空间频率上同时符合物理规律并保证感知质量。 为了验证我们提出的方法的有效性,我们在六个具有挑战性的基准数据集上进行了广泛的实验。结果表明,在仅使用现有最佳方法三分之一计算复杂度(FLOPs)的情况下,我们的方法在PSNR指标上的性能提升了0.5dB;而与具有类似计算预算的方法相比,则表现出超过1dB PSNR的改进。代码和数据将在此[URL]发布。 注:请用实际链接替换上述方括号中的“https URL”文本以提供准确的数据获取途径。

URL

https://arxiv.org/abs/2506.04753

PDF

https://arxiv.org/pdf/2506.04753.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot