Abstract
In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC.
Abstract (translated)
在数字病理学中,在弱监督下的组织病理学全片图像(WSI)分类任务中,仅在玻片级别对巨型像素WSI进行标注的情况中,多实例学习(MIL)策略被广泛使用。然而,现有的以注意力为基础的MIL方法通常忽视了相邻组织薄片之间的上下文信息以及它们的固有空间关系,而基于图的MIL框架则对远距离依赖关系的识别能力有限。在本文中,我们引入了一个整合图Transformer框架,通过一种新颖的图Transformer集成(GTI)块同时捕捉上下文感知的关联特征和全局WSI表示。具体来说,每个GTI块包括一个局部图卷积网络(GCN)层建模邻近关系以及一个高效的全局注意力模型,从广泛的特征嵌入中捕获全面的全局信息。对三个公开可用的WSI数据集:TCGA-NSCLC、TCGA-RCC和BRIGHT的实验表明,与其他最先进的MIL方法相比,我们的方法具有优越性,实现了准确度的提高1.0%至2.6%和AUROC的提高0.7%至1.6%。
URL
https://arxiv.org/abs/2403.18134