Paper Reading AI Learner

Multi-label Video Classification for Underwater Ship Inspection

2023-05-27 02:38:54
Md Abulkalam Azad, Ahmed Mohammed, Maryna Waszak, Brian Elvesæter, Martin Ludvigsen

Abstract

Today ship hull inspection including the examination of the external coating, detection of defects, and other types of external degradation such as corrosion and marine growth is conducted underwater by means of Remotely Operated Vehicles (ROVs). The inspection process consists of a manual video analysis which is a time-consuming and labor-intensive process. To address this, we propose an automatic video analysis system using deep learning and computer vision to improve upon existing methods that only consider spatial information on individual frames in underwater ship hull video inspection. By exploring the benefits of adding temporal information and analyzing frame-based classifiers, we propose a multi-label video classification model that exploits the self-attention mechanism of transformers to capture spatiotemporal attention in consecutive video frames. Our proposed method has demonstrated promising results and can serve as a benchmark for future research and development in underwater video inspection applications.

Abstract (translated)

现代船壳检查包括外部涂层的检验、缺陷的发现以及诸如腐蚀和海洋增长等外部退化类型的检查,方法是通过远程操作车辆(ROV)进行水下船壳视频检查。检查过程包括手动视频分析,这是一个耗时且劳动力密集型的过程。为了解决这一问题,我们提出了一种使用深度学习和计算机视觉技术改进现有方法的方法,改进之处在于仅考虑 individual 帧的空间信息在水下船壳视频检查中。通过探索添加时间信息的好处并分析基于帧的分类器,我们提出了一种多标签视频分类模型,利用变压器的自注意力机制,在连续的视频帧中捕捉时间空间注意力。我们提出的这种方法已经表现出良好的结果,可以作为未来水下视频检查应用研究的基准。

URL

https://arxiv.org/abs/2305.17338

PDF

https://arxiv.org/pdf/2305.17338.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot