Abstract
Self-supervised learning (SSL) methods are popular since they can address situations with limited annotated data by directly utilising the underlying data distribution. However, the adoption of such methods is not explored enough in ultrasound (US) imaging, especially for fetal assessment. We investigate the potential of dual-encoder SSL in utilizing unlabelled US video data to improve the performance of challenging downstream Standard Fetal Cardiac Planes (SFCP) classification using limited labelled 2D US images. We study 7 SSL approaches based on reconstruction, contrastive loss, distillation, and information theory and evaluate them extensively on a large private US dataset. Our observations and findings are consolidated from more than 500 downstream training experiments under different settings. Our primary observation shows that for SSL training, the variance of the dataset is more crucial than its size because it allows the model to learn generalisable representations, which improve the performance of downstream tasks. Overall, the BarlowTwins method shows robust performance, irrespective of the training settings and data variations, when used as an initialisation for downstream tasks. Notably, full fine-tuning with 1% of labelled data outperforms ImageNet initialisation by 12% in F1-score and outperforms other SSL initialisations by at least 4% in F1-score, thus making it a promising candidate for transfer learning from US video to image data.
Abstract (translated)
自监督学习(SSL)方法因可以直接利用底层数据分布来解决缺乏注释数据的情况而受到欢迎。然而,在超声(US)成像中,对这种方法的采用还远远不够,尤其是在胎儿评估方面。我们研究了基于重构、对比损失、馏分和信息理论的双编码器SSL在利用未标注US视频数据提高具有挑战性的下游标准胎儿心影平面(SFCP)分类方面的潜力。我们研究了基于重构、对比损失、馏分和信息理论的7种SSL方法,并对其在一个大型私有US数据集上进行了广泛评估。我们的观察和发现基于不同设置下的500多个下游训练实验进行汇总。我们的主要观察是,对于SSL训练,数据集的方差比大小更重要,因为它允许模型学习具有泛化能力的表示,从而提高下游任务的性能。总的来说,巴low双胞胎方法作为下游任务的初始化,在不同的训练设置和数据变化下表现出稳健的性能。值得注意的是,即使只使用1%的标注数据,完全微调也能够在F1-score上比ImageNet初始化高出12%,并且至少在F1-score上比其他SSL初始化方法高出4%,因此它成为从US视频到图像数据传输学习的有前途的候选者。
URL
https://arxiv.org/abs/2407.21738