Paper Reading AI Learner

Automated Lesion Segmentation of Stroke MRI Using nnU-Net: A Comprehensive External Validation Across Acute and Chronic Lesions

2026-01-13 16:29:20
Tammar Truzman, Matthew A. Lambon Ralph, Ajay D. Halai

Abstract

Accurate and generalisable segmentation of stroke lesions from magnetic resonance imaging (MRI) is essential for advancing clinical research, prognostic modelling, and personalised interventions. Although deep learning has improved automated lesion delineation, many existing models are optimised for narrow imaging contexts and generalise poorly to independent datasets, modalities, and stroke stages. Here, we systematically evaluated stroke lesion segmentation using the nnU-Net framework across multiple heterogeneous, publicly available MRI datasets spanning acute and chronic stroke. Models were trained and tested on diffusion-weighted imaging (DWI), fluid-attenuated inversion recovery (FLAIR), and T1-weighted MRI, and evaluated on independent datasets. Across stroke stages, models showed robust generalisation, with segmentation accuracy approaching reported inter-rater reliability. Performance varied with imaging modality and training data characteristics. In acute stroke, DWI-trained models consistently outperformed FLAIR-based models, with only modest gains from multimodal combinations. In chronic stroke, increasing training set size improved performance, with diminishing returns beyond several hundred cases. Lesion volume was a key determinant of accuracy: smaller lesions were harder to segment, and models trained on restricted volume ranges generalised poorly. MRI image quality further constrained generalisability: models trained on lower-quality scans transferred poorly, whereas those trained on higher-quality data generalised well to noisier images. Discrepancies between predictions and reference masks were often attributable to limitations in manual annotations. Together, these findings show that automated lesion segmentation can approach human-level performance while identifying key factors governing generalisability and informing the development of lesion segmentation tools.

Abstract (translated)

从磁共振成像(MRI)中准确且通用地分割脑卒中病变对于推动临床研究、预后建模和个性化干预至关重要。尽管深度学习已经改善了自动病变勾画的效果,但许多现有的模型针对狭窄的影像学背景进行优化,并在独立的数据集、模式以及不同阶段的脑卒中上推广效果不佳。在这里,我们使用nnU-Net框架系统地评估了跨越急性期和慢性期多种异质性公开MRI数据集上的脑卒中病变分割情况。模型是在扩散加权成像(DWI)、流体衰减反转恢复序列(FLAIR)和T1加权MRI上训练并测试的,并在独立的数据集中进行评价。 无论在哪一阶段,模型均表现出稳健的推广能力,其分割准确性接近报告的人为标记的一致性。性能随影像学模式和训练数据特性变化而不同:在急性期脑卒中时,基于DWI训练的模型始终优于FLAIR基线模型,多模态组合仅带来轻微提升;而在慢性期脑卒中时,增加训练集大小能够提高表现,但在几百个案例之后回报递减。病变体积是决定准确性的一个关键因素:较小的病变更难以分割,并且在受限体积范围训练的模型推广效果差。 此外,MRI图像质量也限制了通用性:基于低质量扫描训练出的模型不能很好地迁移到其他数据集,而基于高质量数据训练出来的模型则能够良好地适应噪声较大的影像。预测结果与参考掩模之间的差异往往归因于人工注释本身的局限性。综上所述,这些发现表明自动化病变分割可以接近人类水平的表现,并识别影响推广性的关键因素,从而为开发病变分割工具提供信息。

URL

https://arxiv.org/abs/2601.08701

PDF

https://arxiv.org/pdf/2601.08701.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot