Abstract
Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transformer (ViT) models, and explicit edge detection operators to tackle this challenge. CTO surpasses existing methods in terms of segmentation accuracy and strikes a better balance between accuracy and efficiency, without the need for additional data inputs or label injections. Specifically, CTO adheres to the canonical encoder-decoder network paradigm, with a dual-stream encoder network comprising a mainstream CNN stream for capturing local features and an auxiliary StitchViT stream for integrating long-range dependencies. Furthermore, to enhance the model's ability to learn boundary areas, we introduce a boundary-guided decoder network that employs binary boundary masks generated by dedicated edge detection operators to provide explicit guidance during the decoding process. We validate the performance of CTO through extensive experiments conducted on seven challenging medical image segmentation datasets, namely ISIC 2016, PH2, ISIC 2018, CoNIC, LiTS17, and BTCV. Our experimental results unequivocally demonstrate that CTO achieves state-of-the-art accuracy on these datasets while maintaining competitive model complexity. The codes have been released at: this https URL.
Abstract (translated)
医学图像分割是医学图像分析和计算机视觉领域中的核心任务之一。尽管目前的方法在准确划分主要感兴趣区域方面已显示出潜力,但精确地分割边界区域仍然是一个挑战。在这项研究中,我们提出了一种新的网络架构,名为CTO(Convolutional Transformer with Operators),它结合了卷积神经网络(CNN)、视觉变换器(ViT)模型和显式的边缘检测算子来解决这一难题。CTO在分割精度方面超越了现有的方法,并且在保持效率的同时达到了更好的精度与效率之间的平衡,无需额外的数据输入或标签注入。 具体而言,CTO遵循经典的编码-解码网络范式,具有一个双流编码器网络,包括一条主流的CNN流用于捕获局部特征和一条辅助的StitchViT流用于整合长距离依赖关系。此外,为了增强模型学习边界区域的能力,我们引入了一个由专用边缘检测算子生成的二进制边界掩码引导解码过程的边界指导式解码网络。 通过在七个具有挑战性的医学图像分割数据集上进行广泛的实验验证(即ISIC 2016、PH2、ISIC 2018、CoNIC、LiTS17和BTCV),我们证明了CTO在这类任务中能够达到最先进的精度,同时保持竞争的模型复杂度。相关代码已发布在:[此处提供链接]。 请注意,在上述翻译中,“this https URL”应当替换为实际发布的代码仓库或项目的具体网址以供参考。
URL
https://arxiv.org/abs/2505.04652