Abstract
Deep learning models benefit from training with a large dataset (labeled or unlabeled). Following this motivation, we present an approach to learn a deep learning model for the automatic segmentation of Organs at Risk (OARs) in cervical cancer radiation treatment from a large clinically available dataset of Computed Tomography (CT) scans containing data inhomogeneity, label noise, and missing annotations. We employ simple heuristics for automatic data cleaning to minimize data inhomogeneity and label noise. Further, we develop a semi-supervised learning approach utilizing a teacher-student setup, annotation imputation, and uncertainty-guided training to learn in presence of missing annotations. Our experimental results show that learning from a large dataset with our approach yields a significant improvement in the test performance despite missing annotations in the data. Further, the contours generated from the segmentation masks predicted by our model are found to be equally clinically acceptable as manually generated contours.
Abstract (translated)
深度学习模型从训练大型数据集(标记或未标记)中获得好处。基于这一动机,我们提出了一种方法,用于学习一种用于 cervical cancer 放疗中危险器官(OARs)的自动分割深度学习模型。该模型从具有数据一致性、标签噪声和缺失标注的大型临床可用的 CT 扫描数据集中提取。我们采用了简单的启发式数据清洗方法,以最小化数据一致性和标签噪声。此外,我们开发了一种半监督学习方法,利用教师学生架构、标注补全和不确定性引导的训练,在缺失标注的情况下学习。我们的实验结果显示,从我们的方法和大型数据集学习可以获得显著改进测试性能,尽管数据中存在缺失标注。此外,我们的模型预测的分割掩码生成的轮廓与手动生成的轮廓同样符合临床接受标准。
URL
https://arxiv.org/abs/2302.10661