Abstract
While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.
Abstract (translated)
尽管将机器学习技术融入医学图像分析领域已经经历了一次变革性的转变,但这种技术的主要挑战通常是缺乏大型、多样化和具有良好标注的大型数据集。 医学图像在格式、大小和其他参数上有所不同,因此需要进行广泛的预处理和标准化,以便在机器学习应用程序中使用。为解决这些挑战,我们引入了医学图像元数据集(MedIMeta),这是一个新型的多领域、多任务元数据集。MedIMeta包含19个医学图像数据集,跨越10个不同的领域,涵盖54个不同的医学任务,所有这些数据集都已标准化为相同的格式,且易于在PyTorch或其他ML框架中使用。我们通过完全监督和跨域少样本学习基准对MedIMeta进行了技术验证,证明了其实用性。
URL
https://arxiv.org/abs/2404.16000