Paper Reading AI Learner

Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology

2025-01-29 18:14:51
Sobhan Hemati, Ghazal Alabtah, Saghir Alfasly, H. R. Tizhoosh

Abstract

A crucial step to efficiently integrate Whole Slide Images (WSIs) in computational pathology is assigning a single high-quality feature vector, i.e., one embedding, to each WSI. With the existence of many pre-trained deep neural networks and the emergence of foundation models, extracting embeddings for sub-images (i.e., tiles or patches) is straightforward. However, for WSIs, given their high resolution and gigapixel nature, inputting them into existing GPUs as a single image is not feasible. As a result, WSIs are usually split into many patches. Feeding each patch to a pre-trained model, each WSI can then be represented by a set of patches, hence, a set of embeddings. Hence, in such a setup, WSI representation learning reduces to set representation learning where for each WSI we have access to a set of patch embeddings. To obtain a single embedding from a set of patch embeddings for each WSI, multiple set-based learning schemes have been proposed in the literature. In this paper, we evaluate the WSI search performance of multiple recently developed aggregation techniques (mainly set representation learning techniques) including simple average or max pooling operations, Deep Sets, Memory networks, Focal attention, Gaussian Mixture Model (GMM) Fisher Vector, and deep sparse and binary Fisher Vector on four different primary sites including bladder, breast, kidney, and Colon from TCGA. Further, we benchmark the search performance of these methods against the median of minimum distances of patch embeddings, a non-aggregating approach used for WSI retrieval.

Abstract (translated)

将全滑动图像(WSI)高效地集成到计算病理学中的一个关键步骤是为每个WSI分配一个高质量的特征向量,即单一嵌入。鉴于许多预训练深度神经网络的存在以及基础模型的出现,提取子图(例如,切片或补丁)的嵌入变得简单直接。然而,对于WSIs来说,由于其高分辨率和数吉像素的特性,将它们作为单个图像输入现有GPU中是不可行的。因此,通常会将WSIs分割成许多小块。通过将每个小块传递给预训练模型,每个WSI可以由一组小块表示,从而形成一系列嵌入。在这种设置下,WSI表示学习简化为集合表示学习,在这种情况下,对于每一个WSI,我们都可以访问到一组补丁嵌入。 为了从每张WSI的多个补丁嵌入中获得单一嵌入,文献中提出了多种基于集的方法。在这篇论文中,我们在四个不同的主要位置(包括TCGA的数据集中膀胱、乳腺、肾脏和结肠)上评估了近期开发出的多项聚合技术(主要是集合表示学习技术),例如简单平均或最大池化操作、Deep Sets、内存网络、焦点注意、高斯混合模型(GMM)、Fisher Vector以及深度稀疏和二进制Fisher Vector。此外,我们还将这些方法的搜索性能与补丁嵌入最小距离中位数进行了基准测试,后者是一种用于WSI检索的非聚合方法。

URL

https://arxiv.org/abs/2501.17822

PDF

https://arxiv.org/pdf/2501.17822.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot