Paper Reading AI Learner

Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection

2025-10-08 17:41:02
Franco Javier Arellano, Jos\'e Ignacio Orlando

Abstract

Diabetic Macular Edema (DME) is a leading cause of vision loss among patients with Diabetic Retinopathy (DR). While deep learning has shown promising results for automatically detecting this condition from fundus images, its application remains challenging due the limited availability of annotated data. Foundation Models (FM) have emerged as an alternative solution. However, it is unclear if they can cope with DME detection in particular. In this paper, we systematically compare different FM and standard transfer learning approaches for this task. Specifically, we compare the two most popular FM for retinal images--RETFound and FLAIR--and an EfficientNet-B0 backbone, across different training regimes and evaluation settings in IDRiD, MESSIDOR-2 and OCT-and-Eye-Fundus-Images (OEFI). Results show that despite their scale, FM do not consistently outperform fine-tuned CNNs in this task. In particular, an EfficientNet-B0 ranked first or second in terms of area under the ROC and precision/recall curves in most evaluation settings, with RETFound only showing promising results in OEFI. FLAIR, on the other hand, demonstrated competitive zero-shot performance, achieving notable AUC-PR scores when prompted appropriately. These findings reveal that FM might not be a good tool for fine-grained ophthalmic tasks such as DME detection even after fine-tuning, suggesting that lightweight CNNs remain strong baselines in data-scarce environments.

Abstract (translated)

糖尿病性黄斑水肿(DME)是糖尿病视网膜病变(DR)患者视力下降的主要原因。尽管深度学习在从眼底图像自动检测这种状况方面显示出有希望的结果,但由于注释数据的可用性有限,其应用仍然面临挑战。基础模型(FM)作为替代解决方案已崭露头角。然而,它们是否能够处理特定的DME检测任务尚不清楚。在这篇论文中,我们系统地比较了不同基础模型和标准迁移学习方法在此任务上的表现。具体来说,我们在IDRiD、MESSIDOR-2 和 OCT-and-Eye-Fundus-Images (OEFI) 数据集的不同训练制度和评估设置下,对两种最流行的眼底图像基础模型——RETFound和FLAIR,以及一个EfficientNet-B0骨干进行了比较。结果显示,尽管规模庞大,基础模型在此任务中并不能始终如一地优于微调的卷积神经网络(CNN)。特别是,在大多数评估设置中,EfficientNet-B0在ROC曲线下的面积(AUC)和精确度/召回率曲线下面积(AUC-PR)方面排名第一或第二,而RETFound仅在OEFI数据集上表现出有前景的结果。另一方面,FLAIR展示了竞争性的零样本性能,在适当提示的情况下取得了显著的AUC-PR分数。这些发现表明,即使经过微调,基础模型可能也不是处理如DME检测这种精细粒度眼科任务的良好工具,这暗示轻量级CNN在数据稀缺环境中仍然是强大的基线模型。

URL

https://arxiv.org/abs/2510.07277

PDF

https://arxiv.org/pdf/2510.07277.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot