Abstract
Current volumetric biomedical foundation models struggle to generalize as public 3D datasets are small and do not cover the broad diversity of medical procedures, conditions, anatomical regions, and imaging protocols. We address this by creating a representation learning method that instead anticipates strong domain shifts at training time itself. We first propose a data engine that synthesizes highly variable training samples that enable generalization to new biomedical contexts. To then train a single 3D network for any voxel-level task, we develop a contrastive learning method that pretrains the network to be stable against nuisance imaging variation simulated by the data engine, a key inductive bias for generalization. This network's features can be used as robust representations of input images for downstream tasks and its weights provide a strong, dataset-agnostic initialization for finetuning on new datasets. As a result, we set new standards across both multimodality registration and few-shot segmentation, a first for any 3D biomedical vision model, all without (pre-)training on any existing dataset of real images.
Abstract (translated)
当前的体积生物医学基础模型难以泛化,因为公共3D数据集较小,并且无法涵盖广泛的医疗程序、状况、解剖区域和成像协议。我们通过创建一种表示学习方法来解决这一问题,该方法在训练过程中能够预见强大的领域变化。首先,我们提出了一种数据引擎,它合成高度可变的训练样本,以实现对新生物医学环境的泛化。为了训练一个适用于任何体素级任务的单一3D网络,我们开发了一种对比学习方法,预先训练该网络对抗由数据引擎模拟的干扰成像变化,这是泛化的关键归纳偏差。这个网络的特征可以用作输入图像在下游任务中的鲁棒表示,而其权重为在新数据集上进行微调提供了强大的、与数据集无关的初始化。因此,我们在多模态配准和少量样本分割方面设立了新的标准,这在任何3D生物医学视觉模型中尚属首次,而且这一切都没有使用任何现有真实图像的数据集(预)训练。
URL
https://arxiv.org/abs/2411.02372