Paper Reading AI Learner

Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models

2023-03-16 17:15:08
Eddardaa Ben Loussaief, Hatem Rashwan, Mohammed Ayad, Mohammed Zakaria Hassan, Domenec Puig

Abstract

With numerous medical tasks, the performance of deep models has recently experienced considerable improvements. These models are often adept learners. Yet, their intricate architectural design and high computational complexity make deploying them in clinical settings challenging, particularly with devices with limited resources. To deal with this issue, Knowledge Distillation (KD) has been proposed as a compression method and an acceleration technology. KD is an efficient learning strategy that can transfer knowledge from a burdensome model (i.e., teacher model) to a lightweight model (i.e., student model). Hence we can obtain a compact model with low parameters with preserving the teacher's performance. Therefore, we develop a KD-based deep model for prostate MRI segmentation in this work by combining features-based distillation with Kullback-Leibler divergence, Lovasz, and Dice losses. We further demonstrate its effectiveness by applying two compression procedures: 1) distilling knowledge to a student model from a single well-trained teacher, and 2) since most of the medical applications have a small dataset, we train multiple teachers that each one trained with a small set of images to learn an adaptive student model as close to the teachers as possible considering the desired accuracy and fast inference time. Extensive experiments were conducted on a public multi-site prostate tumor dataset, showing that the proposed adaptation KD strategy improves the dice similarity score by 9%, outperforming all tested well-established baseline models.

Abstract (translated)

由于众多医疗任务,深度学习模型的性能最近经历了相当大的改善。这些模型通常擅长学习。然而,它们的复杂架构设计和高计算复杂性使它们在临床环境中部署变得困难,特别是资源有限设备的情况下。为了解决这个问题,知识蒸馏(KD)被提出作为压缩方法和加速技术。KD是一种高效的学习策略,可以从负担过重模型(即教师模型)转移到轻便模型(即学生模型),因此可以生成参数较少的紧凑模型,并保留教师的表现。因此,在本文中,我们开发了一个基于KD的前列腺癌磁共振成像分割模型,通过结合特征蒸馏与Kullback-Leibler差异、Lovasz和Dice损失来计算。我们还通过应用两个压缩方法来进一步证明了其有效性:1)从一位受过良好训练的老师那里将知识蒸馏到学生模型;2)由于大多数医疗应用只有小型数据集,我们训练了多个老师,每个老师都使用小型图像集进行训练,以学习尽可能接近教师的学生模型,考虑到所需的精度和快速推理时间。广泛的实验在一份公开的多地点前列腺癌数据集上进行了实施,表明提出的适应KD策略可以提高Dice相似性得分9%,优于所有测试的已知基线模型。

URL

https://arxiv.org/abs/2303.09494

PDF

https://arxiv.org/pdf/2303.09494.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot