Paper Reading AI Learner

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

2024-04-24 10:19:25
Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, Christian Ledig


The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.

Abstract (translated)

深度学习在临床实践中集成往往受到基于有限和异质医疗数据集的挑战的阻碍。此外,在关注几个狭窄的基准上优先改善边缘性能的度量导致在临床应用上的实质性算法进步减缓。这种趋势通常导致在现有方法上进行过度的微调,以在选定的数据集上实现最先进的性能,而不是促进与临床相关的创新。因此,本文提出了一个全面的基准,为 MedMNIST+ 数据库提供多样性,对常见的卷积神经网络(CNN)和基于 Transformer 的架构进行深入分析,以提高医学图像分类的临床相关性。我们的评估包括各种医疗数据集、训练方法和技术,旨在重新评估广泛使用的模型变体。我们的研究结果表明,计算高效的训练方案和现代基础模型有望弥合昂贵端到端训练和更精简的资源优化方法之间的差距。此外,与普遍假设相反,我们观察到,在某些阈值以上,更高的分辨率并不一定改善性能,我们主张在原型阶段使用较低的分辨率,特别是加快处理速度。值得注意的是,我们的分析证实了卷积模型相对于基于 ViT 的架构具有竞争力,突出了理解不同模型架构的固有能力的的重要性。此外,我们希望,我们的标准化评估框架将有助于增强 MedMNIST+ 数据集收集的透明度、可重复性和可比性,同时提高该领域未来的研究水平。代码即将发布。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot