Paper Reading AI Learner

ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation

2024-02-08 18:41:41
Aimee Guo, Gace Fei, Hemanth Pasupuletic, Jing Wang

Abstract

The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult to segment out important structures. In this project, we developed ClickSAM, which fine-tunes the Segment Anything Model using click prompts for ultrasound images. ClickSAM has two stages of training: the first stage is trained on single-click prompts centered in the ground-truth contours, and the second stage focuses on improving the model performance through additional positive and negative click prompts. By comparing the first stage predictions to the ground-truth masks, true positive, false positive, and false negative segments are calculated. Positive clicks are generated using the true positive and false negative segments, and negative clicks are generated using the false positive segments. The Centroidal Voronoi Tessellation algorithm is then employed to collect positive and negative click prompts in each segment that are used to enhance the model performance during the second stage of training. With click-train methods, ClickSAM exhibits superior performance compared to other existing models for ultrasound image segmentation.

Abstract (translated)

刚刚发布的Segment Anything Model(SAM)是一种在图像处理中非常受欢迎的工具,由于其卓越的分割准确度、输入提示的多样性、训练能力和高效的模型设计而备受推崇。然而,目前训练的模型是在非针对医学图像的多样数据集中训练的,特别是超声图像。超声图像通常存在大量的噪声,这使得分割重要结构变得更加困难。在这个项目中,我们开发了ClickSAM,它通过点击超声图像的提示来微调Segment Anything Model。ClickSAM有两个训练阶段:第一个阶段是在目标轮廓的单击提示上进行训练,第二个阶段是通过额外的正负点击提示来提高模型性能。通过将第一阶段预测与真实掩码进行比较,可以计算出真正例、假正例和假负实例。正点击是通过真实正例和假负实例生成的,而负点击是通过假正例实例生成的。然后,采用Centroidal Voronoi Tessellation算法收集每个分段中使用的正负点击提示,用于在第二阶段训练期间增强模型性能。与点击训练方法相比,ClickSAM在超声图像分割方面的表现优于其他现有模型。

URL

https://arxiv.org/abs/2402.05902

PDF

https://arxiv.org/pdf/2402.05902.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot