Paper Reading AI Learner

AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

2024-05-07 18:36:40
Fakrul Islam Tushar, Avivah Wang, Lavsen Dahal, Michael R. Harowicz, Kyle J. Lafata, Tina D. Tailor, Joseph Y. Lo

Abstract

BACKGROUND: Lung cancer's high mortality rate can be mitigated by early detection, which is increasingly reliant on artificial intelligence (AI) for diagnostic imaging. However, the performance of AI models is contingent upon the datasets used for their training and validation. METHODS: This study developed and validated the DLCSD-mD and LUNA16-mD models utilizing the Duke Lung Cancer Screening Dataset (DLCSD), encompassing over 2,000 CT scans with more than 3,000 annotations. These models were rigorously evaluated against the internal DLCSD and external LUNA16 and NLST datasets, aiming to establish a benchmark for imaging-based performance. The assessment focused on creating a standardized evaluation framework to facilitate consistent comparison with widely utilized datasets, ensuring a comprehensive validation of the model's efficacy. Diagnostic accuracy was assessed using free-response receiver operating characteristic (FROC) and area under the curve (AUC) analyses. RESULTS: On the internal DLCSD set, the DLCSD-mD model achieved an AUC of 0.93 (95% CI:0.91-0.94), demonstrating high accuracy. Its performance was sustained on the external datasets, with AUCs of 0.97 (95% CI: 0.96-0.98) on LUNA16 and 0.75 (95% CI: 0.73-0.76) on NLST. Similarly, the LUNA16-mD model recorded an AUC of 0.96 (95% CI: 0.95-0.97) on its native dataset and showed transferable diagnostic performance with AUCs of 0.91 (95% CI: 0.89-0.93) on DLCSD and 0.71 (95% CI: 0.70-0.72) on NLST. CONCLUSION: The DLCSD-mD model exhibits reliable performance across different datasets, establishing the DLCSD as a robust benchmark for lung cancer detection and diagnosis. Through the provision of our models and code to the public domain, we aim to accelerate the development of AI-based diagnostic tools and encourage reproducibility and collaborative advancements within the medical machine-learning (ML) field.

Abstract (translated)

背景:肺癌症的高致死率可以通过早期诊断来降低,这越来越依赖于人工智能(AI)进行诊断成像。然而,AI模型的性能取决于其训练和验证的数据集。方法:本研究利用杜克肺癌筛查数据集(DLCSD)开发和验证了DLCSD-mD和LUNA16-mD模型,包括超过2,000张CT扫描和超过3,000个注释。这些模型通过对内部DLCSD和外部LUNA16和NLST数据集的严格评估,旨在建立基于成像的性能基准。评估的重点是创建一个标准的评估框架,以促进与广泛使用的数据集的一致比较,确保模型有效性的全面验证。诊断准确性通过自由反应接收操作特征(FROC)和面积 Under the curve(AUC)分析进行评估。结果:在内部DLCSD数据集中,DLCSD-mD模型实现了一个AUC of 0.93(95% CI:0.91-0.94),表明其具有很高的准确度。其性能在 external datasets 上得到了维持,LUNA16 的 AUC 为 0.97(95% CI:0.96-0.98),NLST 的 AUC为 0.75(95% CI:0.73-0.76)。同样,LUNA16-mD模型在其 native dataset 上实现了 AUC of 0.96(95% CI:0.95-0.97),并在DLCSD和NLST上显示出可转移的诊断性能,其 AUC分别为 0.91(95% CI:0.89-0.93)和 0.71(95% CI:0.70-0.72)。结论:DLCSD-mD模型在各种数据集上都表现出可靠的表演,使DLCSD成为肺癌检测和诊断的一个 robust 基准。通过将我们的模型和代码公开领域提供,我们旨在加速 AI 基

URL

https://arxiv.org/abs/2405.04605

PDF

https://arxiv.org/pdf/2405.04605.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot