Abstract
Diabetic retinopathy (DR), affecting millions globally with projections indicating a significant rise, poses a severe blindness risk and strains healthcare systems. Diagnostic complexity arises from visual symptom overlap with conditions like age-related macular degeneration and hypertensive retinopathy, exacerbated by high misdiagnosis rates in underserved regions. This study introduces TIMM-ProRS, a novel deep learning framework integrating Vision Transformer (ViT), Convolutional Neural Network (CNN), and Graph Neural Network (GNN) with multi-modal fusion. TIMM-ProRS uniquely leverages both retinal images and temporal biomarkers (HbA1c, retinal thickness) to capture multi-modal and temporal dynamics. Evaluated comprehensively across diverse datasets including APTOS 2019 (trained), Messidor-2, RFMiD, EyePACS, and Messidor-1 (validated), the model achieves 97.8\% accuracy and an F1-score of 0.96, demonstrating state-of-the-art performance and outperforming existing methods like RSG-Net and DeepDR. This approach enables early, precise, and interpretable diagnosis, supporting scalable telemedical management and enhancing global eye health sustainability.
Abstract (translated)
糖尿病视网膜病变(DR)是一种全球影响数百万人的疾病,预计病例数量还将显著增加。它会导致严重的失明风险,并对医疗系统造成压力。诊断复杂性源于视觉症状与年龄相关黄斑变性和高血压性视网膜病等状况之间的重叠现象,在医疗资源不足地区误诊率也较高。 本研究介绍了TIMM-ProRS,这是一种结合了Vision Transformer(ViT)、卷积神经网络(CNN)和图神经网络(GNN),并实现了多模态融合的新型深度学习框架。TIMM-ProRS的独特之处在于它同时利用视网膜图像和时间生物标志物(如糖化血红蛋白HbA1c和视网膜厚度)来捕捉多模态和时间动态变化。 该模型在APTOS 2019、Messidor-2、RFMiD、EyePACS及Messidor-1等多样化的数据集上进行了全面评估,取得了高达97.8%的准确率和F1评分为0.96的成绩,显示出业界领先的表现,并优于现有的方法如RSG-Net和DeepDR。 这一方法能够实现早期、精确且可解释性的诊断,支持远程医疗的大规模管理并增强全球眼健康可持续性。
URL
https://arxiv.org/abs/2601.08240