Abstract
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (UniT) model to detect (tumor existence and location) and diagnose (tumor characteristics) eight major cancer-prevalent organs in CT scans. UniT is a query-based Mask Transformer model with the output of multi-organ and multi-tumor semantic segmentation. We decouple the object queries into organ queries, detection queries and diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. UniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, UniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-organ segmentation methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. Such a unified multi-cancer image reading model (UniT) can significantly reduce the number of false positives produced by combined multi-system models. This moves one step closer towards a universal high-performance cancer screening tool.
Abstract (translated)
人类读者或医学影像学人员在日常实践中经常进行全身多个器官多种疾病的检测和诊断,而大多数医学人工智能系统都被设计专注于单一器官,列出的几个疾病。这可能严重限制AI的临床采用。需要组装一定数量的AI模型,以符合人类阅读CT扫描的诊断过程。在本文中,我们建立了一个统一的肿瘤Transformer模型(UniT),以检测(肿瘤存在和位置)和诊断(肿瘤特征)CT扫描中八个主要癌症的主要器官。UniT是一个基于查询的Mask Transformer模型,具有多器官和多肿瘤语义分割的输出。我们将对象查询分离为器官查询、检测查询和诊断查询,并进一步建立这三个群组之间的层次关系。这种基于临床启发式的架构有效地协助肿瘤在体内和组织内的表示学习,并促进这些复杂的多器官癌症图像读取任务的解决方案。UniT使用 curated 大规模的CT图像库中的10,042名患者,包括八个主要类型的癌症和非癌症肿瘤(所有都有医学影像学专家的3D肿瘤 masks 注释)。在631名患者的测试集中,UniT表现出强大的性能,在一组相关的评价指标中显著优于多器官分割方法和八个单一器官专家模型的集成。这种统一的多癌症图像读取模型(UniT)可以显著减少由综合多系统模型产生的误报数量。这使人们更接近实现一个通用的高性能癌症筛查工具。
URL
https://arxiv.org/abs/2301.12291