Paper Reading AI Learner

Prompt-MIL: Boosting Multi-Instance Learning Schemes via Task-specific Prompt Tuning

2023-03-21 22:24:27
Jingwei Zhang, Saarthak Kapse, Ke Ma, Prateek Prasanna, Joel Saltz, Maria Vakalopoulou, Dimitris Samaras

Abstract

Whole slide image (WSI) classification is a critical task in computational pathology, requiring the processing of gigapixel-sized images, which is challenging for current deep-learning methods. Current state of the art methods are based on multi-instance learning schemes (MIL), which usually rely on pretrained features to represent the instances. Due to the lack of task-specific annotated data, these features are either obtained from well-established backbones on natural images, or, more recently from self-supervised models pretrained on histopathology. However, both approaches yield task-agnostic features, resulting in performance loss compared to the appropriate task-related supervision, if available. In this paper, we show that when task-specific annotations are limited, we can inject such supervision into downstream task training, to reduce the gap between fully task-tuned and task agnostic features. We propose Prompt-MIL, an MIL framework that integrates prompts into WSI classification. Prompt-MIL adopts a prompt tuning mechanism, where only a small fraction of parameters calibrates the pretrained features to encode task-specific information, rather than the conventional full fine-tuning approaches. Extensive experiments on three WSI datasets, TCGA-BRCA, TCGA-CRC, and BRIGHT, demonstrate the superiority of Prompt-MIL over conventional MIL methods, achieving a relative improvement of 1.49%-4.03% in accuracy and 0.25%-8.97% in AUROC while using fewer than 0.3% additional parameters. Compared to conventional full fine-tuning approaches, we fine-tune less than 1.3% of the parameters, yet achieve a relative improvement of 1.29%-13.61% in accuracy and 3.22%-27.18% in AUROC and reduce GPU memory consumption by 38%-45% while training 21%-27% faster.

Abstract (translated)

整个屏幕图像(WSI)分类是计算病理学中的一项关键任务,需要处理大型像素的图像,这对当前深度学习方法来说是具有挑战性的。当前最先进的方法是基于多个实例学习Scheme(MIL)的方法,通常依赖于预先训练的特征来代表实例。由于缺乏特定的标注数据,这些特征要么来自自然图像中的稳定骨干,要么最近来自基于病理生理学的自监督模型。然而, both approaches 产生任务无关的特征,如果可用的话,相对于适当的任务相关监督,会导致性能损失。在本文中,我们表明,当特定的标注数据有限时,我们可以将这种监督注入后续任务训练中,以减少 fully 任务调整和任务无关特征之间的差距。我们提出了Prompt-MIL,一个MIL框架,将提示集成到WSI分类中。Prompt-MIL采用prompt调整机制,其中只有一小部分参数校准了预先训练的特征以编码任务特定信息,而不是传统的全面微调方法。在三个WSI数据集TCGA-BRCA、TCGA-CRC和BRIGHT的实验中,广泛的实验表明Prompt-MIL比传统MIL方法优越,在准确性上实现了1.49%-4.03%的相对改善,在aucROC上实现了0.25%-8.97%的性能损失,而使用不到0.3%额外的参数。与传统的全面微调方法相比,我们微调不到1.3%的参数,但在准确性上实现了1.29%-13.61%的相对改善,在aucROC上实现了3.22%-27.18%的性能损失,同时训练速度加快了21%-27%。

URL

https://arxiv.org/abs/2303.12214

PDF

https://arxiv.org/pdf/2303.12214.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot