Paper Reading AI Learner

Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification

2024-03-26 22:31:05
Zhan Shi, Jingwei Zhang, Jun Kong, Fusheng Wang

Abstract

In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC.

Abstract (translated)

在数字病理学中,在弱监督下的组织病理学全片图像(WSI)分类任务中,仅在玻片级别对巨型像素WSI进行标注的情况中,多实例学习(MIL)策略被广泛使用。然而,现有的以注意力为基础的MIL方法通常忽视了相邻组织薄片之间的上下文信息以及它们的固有空间关系,而基于图的MIL框架则对远距离依赖关系的识别能力有限。在本文中,我们引入了一个整合图Transformer框架,通过一种新颖的图Transformer集成(GTI)块同时捕捉上下文感知的关联特征和全局WSI表示。具体来说,每个GTI块包括一个局部图卷积网络(GCN)层建模邻近关系以及一个高效的全局注意力模型,从广泛的特征嵌入中捕获全面的全局信息。对三个公开可用的WSI数据集:TCGA-NSCLC、TCGA-RCC和BRIGHT的实验表明,与其他最先进的MIL方法相比,我们的方法具有优越性,实现了准确度的提高1.0%至2.6%和AUROC的提高0.7%至1.6%。

URL

https://arxiv.org/abs/2403.18134

PDF

https://arxiv.org/pdf/2403.18134.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot