Abstract
Accurate measurement of eyelid parameters such as Margin Reflex Distances (MRD1, MRD2) and Levator Function (LF) is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods. This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images. We assess performance across frozen and fine-tuned settings, using MSE, MAE, and R2 metrics. DINOv2, pretrained through self-supervised learning, demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment. Lightweight regressors such as MLP and Deep Ensemble offer high precision with minimal computational overhead. To address class imbalance and improve generalization, we integrate focal loss, orthogonal regularization, and binary encoding strategies. Our results show that DINOv2 combined with these enhancements delivers consistent, accurate predictions across all tasks, making it a strong candidate for real-world, mobile-friendly clinical applications. This work highlights the potential of foundation models in advancing AI-powered ophthalmic care.
Abstract (translated)
准确测量眼睑参数,如边缘反射距离(MRD1和MRD2)及提上睑肌功能(LF),在眼整形诊断中至关重要,但目前仍受限于手动且不一致的方法。本研究评估了几种深度学习模型:SE-ResNet、EfficientNet以及基于视觉变换器的DINOv2,以利用智能手机获取的眼部图像实现这些测量的自动化。我们通过均方误差(MSE)、平均绝对误差(MAE)和R²指标,在冻结和微调设置下评估了这些模型的表现。 预训练模型DINOv2通过自监督学习获得了优异的可扩展性和鲁棒性,尤其是在冻结条件下表现尤为突出,这使其非常适合移动设备部署。轻量级回归器如多层感知机(MLP)及深度集成方法提供了高精度的同时减少了计算开销。为了应对类别不平衡和提高泛化能力,我们整合了焦损、正交规则化以及二进制编码策略。 实验结果显示,结合这些改进后的DINOv2模型在所有任务中均能提供一致且精确的预测结果,使其成为实际临床应用中的移动友好型候选方案。这项研究强调了基础模型在推动AI驱动的眼科护理进步方面的巨大潜力。
URL
https://arxiv.org/abs/2504.00515