AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

Abstract
Abstract (translated)
URL
PDF

Abstract

BACKGROUND: Lung cancer's high mortality rate can be mitigated by early detection, which is increasingly reliant on artificial intelligence (AI) for diagnostic imaging. However, the performance of AI models is contingent upon the datasets used for their training and validation. METHODS: This study developed and validated the DLCSD-mD and LUNA16-mD models utilizing the Duke Lung Cancer Screening Dataset (DLCSD), encompassing over 2,000 CT scans with more than 3,000 annotations. These models were rigorously evaluated against the internal DLCSD and external LUNA16 and NLST datasets, aiming to establish a benchmark for imaging-based performance. The assessment focused on creating a standardized evaluation framework to facilitate consistent comparison with widely utilized datasets, ensuring a comprehensive validation of the model's efficacy. Diagnostic accuracy was assessed using free-response receiver operating characteristic (FROC) and area under the curve (AUC) analyses. RESULTS: On the internal DLCSD set, the DLCSD-mD model achieved an AUC of 0.93 (95% CI:0.91-0.94), demonstrating high accuracy. Its performance was sustained on the external datasets, with AUCs of 0.97 (95% CI: 0.96-0.98) on LUNA16 and 0.75 (95% CI: 0.73-0.76) on NLST. Similarly, the LUNA16-mD model recorded an AUC of 0.96 (95% CI: 0.95-0.97) on its native dataset and showed transferable diagnostic performance with AUCs of 0.91 (95% CI: 0.89-0.93) on DLCSD and 0.71 (95% CI: 0.70-0.72) on NLST. CONCLUSION: The DLCSD-mD model exhibits reliable performance across different datasets, establishing the DLCSD as a robust benchmark for lung cancer detection and diagnosis. Through the provision of our models and code to the public domain, we aim to accelerate the development of AI-based diagnostic tools and encourage reproducibility and collaborative advancements within the medical machine-learning (ML) field.

Abstract (translated)

背景：肺癌症的高致死率可以通过早期诊断来降低，这越来越依赖于人工智能（AI）进行诊断成像。然而，AI模型的性能取决于其训练和验证的数据集。方法：本研究利用杜克肺癌筛查数据集（DLCSD）开发和验证了DLCSD-mD和LUNA16-mD模型，包括超过2,000张CT扫描和超过3,000个注释。这些模型通过对内部DLCSD和外部LUNA16和NLST数据集的严格评估，旨在建立基于成像的性能基准。评估的重点是创建一个标准的评估框架，以促进与广泛使用的数据集的一致比较，确保模型有效性的全面验证。诊断准确性通过自由反应接收操作特征（FROC）和面积 Under the curve（AUC）分析进行评估。结果：在内部DLCSD数据集中，DLCSD-mD模型实现了一个AUC of 0.93（95% CI：0.91-0.94），表明其具有很高的准确度。其性能在 external datasets 上得到了维持，LUNA16 的 AUC 为 0.97（95% CI：0.96-0.98），NLST 的 AUC为 0.75（95% CI：0.73-0.76）。同样，LUNA16-mD模型在其 native dataset 上实现了 AUC of 0.96（95% CI：0.95-0.97），并在DLCSD和NLST上显示出可转移的诊断性能，其 AUC分别为 0.91（95% CI：0.89-0.93）和 0.71（95% CI：0.70-0.72）。结论：DLCSD-mD模型在各种数据集上都表现出可靠的表演，使DLCSD成为肺癌检测和诊断的一个 robust 基准。通过将我们的模型和代码公开领域提供，我们旨在加速 AI 基

URL

https://arxiv.org/abs/2405.04605

PDF

https://arxiv.org/pdf/2405.04605.pdf

AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

Abstract

Abstract (translated)

URL

PDF Copy

PDF