Paper Reading AI Learner

MAS-SAM: Segment Any Marine Animal with Aggregated Features

2024-04-24 07:38:14
Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

Abstract

Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at this https URL.

Abstract (translated)

最近,Segment Anything Model(SAM)在生成高质量的对象掩码和实现零 shot 图像分割方面表现出优异性能。然而,作为一种多功能的视觉模型,SAM主要在大型自然光图像上进行训练。在水下场景中,由于光散射和吸收,其性能显著下降。同时,SAM的解码器的简单性可能导致对细粒度物体细节的丢失。为解决上述问题,我们提出了一个名为MAS-SAM的新特征学习框架,用于海洋动物分割,其中将有效的适配器集成到SAM的编码器中并构建了一个金字塔解码器。具体来说,我们首先为水下场景构建了一个新的SAM编码器,其中包括有效的适配器。然后,我们引入了一个Hypermap Extraction Module(HEM)以生成全面的引导多尺度特征。最后,我们提出了一个Progressive Prediction Decoder(PPD)来聚合多尺度特征并预测最终分割结果。当与Fusion Attention Module(FAM)结合时,我们的方法能够从全局上下文线索中提取更丰富的海洋信息,从而从细粒度局部细节进行更精确的分割。在四个公开的MAS数据集上进行的大量实验证明,我们的MAS-SAM能够获得比其他典型分割方法更好的结果。源代码可在此处访问:https:// this URL.

URL

https://arxiv.org/abs/2404.15700

PDF

https://arxiv.org/pdf/2404.15700.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot