Abstract
Deep learning technologies have already demonstrated a high potential to build diagnosis support systems from medical imaging data, such as Chest X-Ray images. However, the shortage of labeled data in the medical field represents one key obstacle to narrow down the performance gap with respect to applications in other image domains. In this work, we investigate the benefits of a curricular Self-Supervised Learning (SSL) pretraining scheme with respect to fully-supervised training regimes for pneumonia recognition on Chest X-Ray images of Covid-19 patients. We show that curricular SSL pretraining, which leverages unlabeled data, outperforms models trained from scratch, or pretrained on ImageNet, indicating the potential of performance gains by SSL pretraining on massive unlabeled datasets. Finally, we demonstrate that top-performing SSLpretrained models show a higher degree of attention in the lung regions, embodying models that may be more robust to possible external confounding factors in the training datasets, identified by previous works.
Abstract (translated)
深度学习技术已经证明了从医学影像数据(如胸部X光片)构建诊断支持系统的巨大潜力。然而,在医学领域中缺乏标记数据代表了一种关键障碍,以缩小与其他图像领域的应用之间的差距。在本研究中,我们研究了课程自我监督学习(SSL)预训练方案与完全监督训练体系对COVID-19患者的胸部X光片肺炎识别训练的影响。我们表明,课程SSL预训练利用未标记数据 outperforms 从头训练或在ImageNet上预训练的模型,这表明SSL预训练在大规模未标记数据集上的性能增益潜力。最后,我们展示了顶级表现SSL预训练模型在肺部区域表现出更高的注意程度,这些模型可能更加 robust to 在训练数据集中可能面临的外部混淆因素。
URL
https://arxiv.org/abs/2301.10687