Paper Reading AI Learner

Temporal-Enhanced Interpretable Multi-Modal Prognosis and Risk Stratification Framework for Diabetic Retinopathy

2026-01-13 05:57:47
Susmita Kar, A S M Ahsanul Sarkar Akib, Abdul Hasib, Samin Yaser, Anas Bin Azim

Abstract

Diabetic retinopathy (DR), affecting millions globally with projections indicating a significant rise, poses a severe blindness risk and strains healthcare systems. Diagnostic complexity arises from visual symptom overlap with conditions like age-related macular degeneration and hypertensive retinopathy, exacerbated by high misdiagnosis rates in underserved regions. This study introduces TIMM-ProRS, a novel deep learning framework integrating Vision Transformer (ViT), Convolutional Neural Network (CNN), and Graph Neural Network (GNN) with multi-modal fusion. TIMM-ProRS uniquely leverages both retinal images and temporal biomarkers (HbA1c, retinal thickness) to capture multi-modal and temporal dynamics. Evaluated comprehensively across diverse datasets including APTOS 2019 (trained), Messidor-2, RFMiD, EyePACS, and Messidor-1 (validated), the model achieves 97.8\% accuracy and an F1-score of 0.96, demonstrating state-of-the-art performance and outperforming existing methods like RSG-Net and DeepDR. This approach enables early, precise, and interpretable diagnosis, supporting scalable telemedical management and enhancing global eye health sustainability.

Abstract (translated)

糖尿病视网膜病变(DR)是一种全球影响数百万人的疾病,预计病例数量还将显著增加。它会导致严重的失明风险,并对医疗系统造成压力。诊断复杂性源于视觉症状与年龄相关黄斑变性和高血压性视网膜病等状况之间的重叠现象,在医疗资源不足地区误诊率也较高。 本研究介绍了TIMM-ProRS,这是一种结合了Vision Transformer(ViT)、卷积神经网络(CNN)和图神经网络(GNN),并实现了多模态融合的新型深度学习框架。TIMM-ProRS的独特之处在于它同时利用视网膜图像和时间生物标志物(如糖化血红蛋白HbA1c和视网膜厚度)来捕捉多模态和时间动态变化。 该模型在APTOS 2019、Messidor-2、RFMiD、EyePACS及Messidor-1等多样化的数据集上进行了全面评估,取得了高达97.8%的准确率和F1评分为0.96的成绩,显示出业界领先的表现,并优于现有的方法如RSG-Net和DeepDR。 这一方法能够实现早期、精确且可解释性的诊断,支持远程医疗的大规模管理并增强全球眼健康可持续性。

URL

https://arxiv.org/abs/2601.08240

PDF

https://arxiv.org/pdf/2601.08240.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot