Paper Reading AI Learner

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

2023-11-29 12:47:42
Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Qi Wu, Yong Xia

Abstract

Self-supervised learning is an efficient pre-training method for medical image analysis. However, current research is mostly confined to specific-modality data pre-training, consuming considerable time and resources without achieving universality across different modalities. A straightforward solution is combining all modality data for joint self-supervised pre-training, which poses practical challenges. Firstly, our experiments reveal conflicts in representation learning as the number of modalities increases. Secondly, multi-modal data collected in advance cannot cover all real-world scenarios. In this paper, we reconsider versatile self-supervised learning from the perspective of continual learning and propose MedCoSS, a continuous self-supervised learning approach for multi-modal medical data. Unlike joint self-supervised learning, MedCoSS assigns different modality data to different training stages, forming a multi-stage pre-training process. To balance modal conflicts and prevent catastrophic forgetting, we propose a rehearsal-based continual learning method. We introduce the k-means sampling strategy to retain data from previous modalities and rehearse it when learning new modalities. Instead of executing the pretext task on buffer data, a feature distillation strategy and an intra-modal mixup strategy are applied to these data for knowledge retention. We conduct continuous self-supervised pre-training on a large-scale multi-modal unlabeled dataset, including clinical reports, X-rays, CT scans, MRI scans, and pathological images. Experimental results demonstrate MedCoSS's exceptional generalization ability across nine downstream datasets and its significant scalability in integrating new modality data. Code and pre-trained weight are available at this https URL.

Abstract (translated)

自监督学习是一种有效的医学图像分析预训练方法。然而,目前的研究大多局限于特定模态数据预训练,在各个模态之间缺乏普适性。一个简单的解决方案是联合自监督预训练所有模态数据,这带来了实际挑战。首先,随着模态数量的增加,表示学习存在冲突。其次,提前收集的多模态数据无法涵盖所有现实场景。在本文中,我们从连续学习的角度重新审视了多模态医疗数据的自我监督学习,并提出了MedCoSS,一种连续多模态医疗数据的自监督学习方法。与联合自监督学习不同,MedCoSS将不同模态的数据分配到不同的训练阶段,形成了一个多阶段预训练过程。为了平衡模态冲突并防止灾难性遗忘,我们提出了一个基于演练的连续学习方法。我们引入了k-means抽样策略来保留先前的模态数据,并在学习新模态时对其进行重新抽样。我们不但在缓冲数据上执行预处理任务,还对数据应用了特征蒸馏策略和内模混淆策略以保留知识。我们在包括临床报告、X光片、CT扫描、MRI扫描和病理图像的大型多模态无标签数据集上进行连续自监督预训练。实验结果证明了MedCoSS在九个下游数据集上的非凡泛化能力,以及其在整合新模态数据方面的显著可扩展性。代码和预训练权重可在此https URL找到。

URL

https://arxiv.org/abs/2311.17597

PDF

https://arxiv.org/pdf/2311.17597.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot