Paper Reading AI Learner

How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

2024-04-15 17:31:32
Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski

Abstract

Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, the foundation model developed with image segmentation in mind - Segment Anything Model (SAM) - has been developed only recently and has shown similar promise. However, there are still no systematic analyses or ``best-practice'' guidelines for optimal fine-tuning of SAM for medical image segmentation. This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations, and evaluates them on 17 datasets covering all common radiology modalities. Our study reveals that (1) fine-tuning SAM leads to slightly better performance than previous segmentation methods, (2) fine-tuning strategies that use parameter-efficient learning in both the encoder and decoder are superior to other strategies, (3) network architecture has a small impact on final performance, (4) further training SAM with self-supervised learning can improve final model performance. We also demonstrate the ineffectiveness of some methods popular in the literature and further expand our experiments into few-shot and prompt-based settings. Lastly, we released our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM, at this https URL.

Abstract (translated)

自动分割是医学图像分析的基本任务,在深度学习出现后取得了显著的进步。虽然基础模型在自然语言处理和某些视觉任务上已经有所帮助,但专门为图像分割开发的基础模型 - Segment Anything Model (SAM) - 仅最近才开发,并显示出与原SAM相似的潜力。然而,在医疗图像分割的优化细调方面,还没有系统的分析或“最佳实践”指南。本文总结了18种不同骨干架构、模型组件和细调算法的现有细调策略,并在包括所有常见放射学模态的17个数据集上对其进行了评估。我们的研究显示,(1)细调SAM slightly提高了性能, (2)使用参数高效的编码器和解码器策略的细调策略优于其他策略, (3)网络架构对最终性能的影响很小, (4)使用自监督学习进一步训练SAM可以提高最终模型性能。我们还证明了文献中流行的一些方法的无效性,并将实验扩展到基于少样本和提示的设置。最后,我们发布了我们的代码和专用的MRI细调权重,这些权重在原SAM上 consistently取得了卓越的性能,可以在该链接处访问:https://url.

URL

https://arxiv.org/abs/2404.09957

PDF

https://arxiv.org/pdf/2404.09957.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot