Abstract
Automatic polyp segmentation is crucial for improving the clinical identification of colorectal cancer (CRC). While Deep Learning (DL) techniques have been extensively researched for this problem, current methods frequently struggle with generalization, particularly in data-constrained or challenging settings. Moreover, many existing polyp segmentation methods rely on complex, task-specific architectures. To address these limitations, we present a framework that leverages the intrinsic robustness of DINO self-attention "key" features for robust segmentation. Unlike traditional methods that extract tokens from the deepest layers of the Vision Transformer (ViT), our approach leverages the key features of the self-attention module with a simple convolutional decoder to predict polyp masks, resulting in enhanced performance and better generalizability. We validate our approach using a multi-center dataset under two rigorous protocols: Domain Generalization (DG) and Extreme Single Domain Generalization (ESDG). Our results, supported by a comprehensive statistical analysis, demonstrate that this pipeline achieves state-of-the-art (SOTA) performance, significantly enhancing generalization, particularly in data-scarce and challenging scenarios. While avoiding a polyp-specific architecture, we surpass well-established models like nnU-Net and UM-Net. Additionally, we provide a systematic benchmark of the DINO framework's evolution, quantifying the specific impact of architectural advancements on downstream polyp segmentation performance.
Abstract (translated)
自动息肉分割对于提高结直肠癌(CRC)的临床识别至关重要。尽管深度学习(DL)技术已经广泛应用于解决这个问题,但目前的方法在数据受限或挑战性环境中往往难以实现泛化效果。此外,许多现有的息肉分割方法依赖于复杂且特定任务的设计架构。为了克服这些限制,我们提出了一种框架,该框架利用DINO自注意力机制“键”特征的内在鲁棒性来进行稳健的分割。与传统的从视觉变换器(ViT)最深层提取令牌的方法不同,我们的方法使用简单的卷积解码器结合自注意力模块的键特征来预测息肉掩模,从而提高性能并实现更好的泛化能力。 我们通过一个多中心数据集在两个严格的协议下验证了这一方法:领域泛化(DG)和极端单一领域泛化(ESDG)。我们的结果,经全面统计分析支持,显示该管道达到了最先进的(SOTA)性能,在数据稀缺和挑战性场景中显著增强了泛化能力。同时避免使用特定于息肉的架构设计,我们超越了诸如nnU-Net和UM-Net等已确立的模型。 此外,我们还系统地评估了DINO框架的发展历程,并量化了架构改进对下游息肉分割性能的具体影响。
URL
https://arxiv.org/abs/2512.13376