Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

Abstract
Abstract (translated)
URL
PDF

Abstract

The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.

Abstract (translated)

深度学习在临床实践中集成往往受到基于有限和异质医疗数据集的挑战的阻碍。此外，在关注几个狭窄的基准上优先改善边缘性能的度量导致在临床应用上的实质性算法进步减缓。这种趋势通常导致在现有方法上进行过度的微调，以在选定的数据集上实现最先进的性能，而不是促进与临床相关的创新。因此，本文提出了一个全面的基准，为 MedMNIST+ 数据库提供多样性，对常见的卷积神经网络（CNN）和基于 Transformer 的架构进行深入分析，以提高医学图像分类的临床相关性。我们的评估包括各种医疗数据集、训练方法和技术，旨在重新评估广泛使用的模型变体。我们的研究结果表明，计算高效的训练方案和现代基础模型有望弥合昂贵端到端训练和更精简的资源优化方法之间的差距。此外，与普遍假设相反，我们观察到，在某些阈值以上，更高的分辨率并不一定改善性能，我们主张在原型阶段使用较低的分辨率，特别是加快处理速度。值得注意的是，我们的分析证实了卷积模型相对于基于 ViT 的架构具有竞争力，突出了理解不同模型架构的固有能力的的重要性。此外，我们希望，我们的标准化评估框架将有助于增强 MedMNIST+ 数据集收集的透明度、可重复性和可比性，同时提高该领域未来的研究水平。代码即将发布。

URL

https://arxiv.org/abs/2404.15786

PDF

https://arxiv.org/pdf/2404.15786.pdf

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

Abstract

Abstract (translated)

URL

PDF Copy

PDF