Abstract
Purpose: The recent Segment Anything Model (SAM) has demonstrated impressive performance with point, text or bounding box prompts, in various applications. However, in safety-critical surgical tasks, prompting is not possible due to (i) the lack of per-frame prompts for supervised learning, (ii) it is unrealistic to prompt frame-by-frame in a real-time tracking application, and (iii) it is expensive to annotate prompts for offline applications. Methods: We develop Surgical-DeSAM to generate automatic bounding box prompts for decoupling SAM to obtain instrument segmentation in real-time robotic surgery. We utilise a commonly used detection architecture, DETR, and fine-tuned it to obtain bounding box prompt for the instruments. We then empolyed decoupling SAM (DeSAM) by replacing the image encoder with DETR encoder and fine-tune prompt encoder and mask decoder to obtain instance segmentation for the surgical instruments. To improve detection performance, we adopted the Swin-transformer to better feature representation. Results: The proposed method has been validated on two publicly available datasets from the MICCAI surgical instruments segmentation challenge EndoVis 2017 and 2018. The performance of our method is also compared with SOTA instrument segmentation methods and demonstrated significant improvements with dice metrics of 89.62 and 90.70 for the EndoVis 2017 and 2018. Conclusion: Our extensive experiments and validations demonstrate that Surgical-DeSAM enables real-time instrument segmentation without any additional prompting and outperforms other SOTA segmentation methods.
Abstract (translated)
目的:最近,Segment Anything Model(SAM)通过点、文本或边界框提示在各种应用中展示了出色的性能。然而,在关键手术任务中,由于(i)缺少每个帧的监督学习指导,(ii)在实时跟踪应用程序中逐帧提示是不现实的,(iii)为离线应用程序标注提示成本高昂,我们开发了Surgical-DeSAM,用于生成自动边界框提示,以将SAM与实时机器人手术解耦,并获得器械分割。我们利用了一个常用的检测架构DETR并对其进行了微调,以获得器械的边界框提示。然后,通过用DETR编码器替换图像编码器,并微调提示编码器和遮罩解码器,我们实现了手术器械的实例分割。为了提高检测性能,我们采用了Swin-transformer来更好地表示特征。结果:所提出的方法已通过在EndoVis 2017和2018两个公开可用的数据集上进行验证。我们的方法与其他用于手术器械分割的最好方法进行了比较,并使用迪氏分数(89.62)和吉氏分数(90.70)证明了在EndoVis 2017和2018上显著的改进。结论:我们的大量实验和验证证明,Surgical-DeSAM实现了没有额外提示的实时器械分割,并超越了其他SOTA分割方法。
URL
https://arxiv.org/abs/2404.14040