Abstract
Underwater images are often affected by complex degradations such as light absorption, scattering, color casts, and artifacts, making enhancement critical for effective object detection, recognition, and scene understanding in aquatic environments. Existing methods, especially diffusion-based approaches, typically rely on synthetic paired datasets due to the scarcity of real underwater references, introducing bias and limiting generalization. Furthermore, fine-tuning these models can degrade learned priors, resulting in unrealistic enhancements due to domain shifts. To address these challenges, we propose UDAN-CLIP, an image-to-image diffusion framework pre-trained on synthetic underwater datasets and enhanced with a customized classifier based on vision-language model, a spatial attention module, and a novel CLIP-Diffusion loss. The classifier preserves natural in-air priors and semantically guides the diffusion process, while the spatial attention module focuses on correcting localized degradations such as haze and low contrast. The proposed CLIP-Diffusion loss further strengthens visual-textual alignment and helps maintain semantic consistency during enhancement. The proposed contributions empower our UDAN-CLIP model to perform more effective underwater image enhancement, producing results that are not only visually compelling but also more realistic and detail-preserving. These improvements are consistently validated through both quantitative metrics and qualitative visual comparisons, demonstrating the model's ability to correct distortions and restore natural appearance in challenging underwater conditions.
Abstract (translated)
水下图像通常会受到光吸收、散射、色偏和伪影等复杂退化的严重影响,这使得增强对于有效物体检测、识别和场景理解至关重要。现有的方法,尤其是基于扩散的方法,由于缺乏真实的水下参考数据集而依赖于合成配对的数据集,从而引入了偏差并限制了泛化能力。此外,在进行微调时,这些模型会破坏已学习的先验知识,导致因领域偏移而导致不真实的增强效果。为了解决这些问题,我们提出了UDAN-CLIP,这是一种基于图像到图像扩散框架的方法,它在合成水下数据集上进行了预训练,并通过一个基于视觉语言模型的定制分类器、空间注意力模块以及一种新颖的CLIP-Diffusion损失函数得到了改进。该分类器保留了自然空气中的先验知识并为扩散过程提供了语义引导,而空间注意力模块则专注于纠正诸如雾气和低对比度等局部退化现象。所提出的CLIP-Diffusion损失进一步增强了视觉文本对齐,并有助于在增强过程中保持语义一致性。我们的贡献使UDAN-CLIP模型能够更有效地进行水下图像增强,不仅产生视觉上吸引人的结果,而且更加真实并保留了细节。这些改进通过定量指标和定性视觉对比得到了一致验证,展示了该模型能够在具有挑战性的水下条件下纠正失真并恢复自然外观的能力。
URL
https://arxiv.org/abs/2505.19895