BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Abstract
Abstract (translated)
URL
PDF

Abstract

Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: 1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and 2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations, and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.

Abstract (translated)

检测 screening mammogram 中的恶性病变的方法通常使用完全注释的数据集进行训练,图像备注了病变的位置和分类。然而,真实的 screening mammogram 数据集通常包含一个完全注释的子集和一个轻度注释的子集,且仅根据全球分类进行注释(即没有病变定位)。考虑到这些数据集的巨大规模,研究人员通常面临着一个困境:不使用这个子集或不 fully annotate it。第一种选择会减少检测精度,因为未使用整个数据集,而第二种选择由于注释需要由专家影评人进行,太贵了。在本文中,我们提出了一种中间解决方案,即将其训练作为一个弱监督学习和半监督学习的问题,我们称之为“不完整注释的恶性乳腺癌检测”。为了解决这个问题,我们的新方法包括两个阶段,即:1)预处理一个多视图 mammogram 分类器,从整个数据集进行弱监督训练,2)扩展训练分类器成为一个半监督的学生-教师学习训练的多视图探测器,训练集包含 fully 和轻度注释的 mammograms。我们提供了两个包含不完整注释数据的 real-world screening mammogram 数据集的大量检测结果,并表明我们提出的这种方法在不完整注释的恶性乳腺癌检测方面取得了最先进的结果。

URL

https://arxiv.org/abs/2301.13418

PDF

https://arxiv.org/pdf/2301.13418.pdf