Abstract
Self-supervised learning is an efficient pre-training method for medical image analysis. However, current research is mostly confined to specific-modality data pre-training, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint self-supervised pre-training, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile self-supervised learning from the perspective of continual learning and propose MedCoSS, a continuous self-supervised learning approach for multi-modal medical data. Unlike joint self-supervised learning, MedCoSS assigns different modality data to different training stages, forming a multi-stage pre-training process. To balance modal conflicts and prevent catastrophic forgetting, we propose a rehearsal-based continual learning method. We introduce the k-means sampling strategy to retain data from previous modalities and rehearse it when learning new modalities. Instead of executing the pretext task on buffer data, a feature distillation strategy and an intra-modal mixup strategy are applied to these data for knowledge retention. We conduct continuous self-supervised pre-training on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT scans, MRI scans, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across nine downstream datasets and its significant scalability in integrating new modality data. Code and pre-trained weight are available at this https URL.
Abstract (translated)
自监督学习是一种有效的医学图像分析预训练方法。然而,目前的研究大多局限于特定模态数据预训练,在各个模态之间缺乏普适性。一个简单的解决方案是联合自监督预训练所有模态数据,这带来了实际挑战。首先,随着模态数量的增加,表示学习存在冲突。其次,提前收集的多模态数据无法涵盖所有现实场景。在本文中,我们从连续学习的角度重新审视了多模态医疗数据的自我监督学习,并提出了MedCoSS,一种连续多模态医疗数据的自监督学习方法。与联合自监督学习不同,MedCoSS将不同模态的数据分配到不同的训练阶段,形成了一个多阶段预训练过程。为了平衡模态冲突并防止灾难性遗忘,我们提出了一个基于演练的连续学习方法。我们引入了k-means抽样策略来保留先前的模态数据,并在学习新模态时对其进行重新抽样。我们不但在缓冲数据上执行预处理任务,还对数据应用了特征蒸馏策略和内模混淆策略以保留知识。我们在包括临床报告、X光片、CT扫描、MRI扫描和病理图像的大型多模态无标签数据集上进行连续自监督预训练。实验结果证明了MedCoSS在九个下游数据集上的非凡泛化能力,以及其在整合新模态数据方面的显著可扩展性。代码和预训练权重可在此https URL找到。
URL
https://arxiv.org/abs/2311.17597