Paper Reading AI Learner

A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

2024-03-28 01:00:08
Junjie Wen, Jinqiang Cui, Benyun Zhao, Bingxin Han, Xuchen Liu, Zhi Gao, Ben M. Chen

Abstract

In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capability. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing their performance. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.

Abstract (translated)

近年来,在水下图像增强(UIE)领域取得了显著的进展。然而,将其应用于高级视觉任务,如自主水下车辆(AUV)下的水下物体检测(UOD),仍然相对未经探索。这可能归因于以下几个因素: (1)现有的方法通常将UIE作为预处理步骤,这不可避免地引入了相当大的计算开销和延迟。 (2)在训练物体检测器之前增强图像的过程未必能带来性能提升。 (3)水下复杂的环境可能会在不同的场景之间引起显著的领域转移,严重削弱了UOD的性能。为了应对这些挑战,我们引入了EnYOLO,一个专为同时进行UIE和UOD的领域自适应框架。具体来说,UIE和UOD任务头共享相同的网络骨架,并采用轻量级设计。此外,为了确保两个任务之间的平衡训练,我们提出了多阶段训练策略,旨在持续提高它们的性能。此外,我们还提出了一种新的领域自适应策略,以对来自不同水下环境的特征嵌入进行对齐。全面的实验证明,我们的框架不仅在UIE和UOD任务上实现了最先进的(SOTA)性能,而且当应用于不同水下场景时表现出卓越的适应性。我们的效率分析进一步突显了我们的框架在车载部署方面的巨大潜力。

URL

https://arxiv.org/abs/2403.19079

PDF

https://arxiv.org/pdf/2403.19079.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot