Paper Reading AI Learner

Training Frozen Feature Pyramid DINOv2 for Eyelid Measurements with Infinite Encoding and Orthogonal Regularization


Abstract

Accurate measurement of eyelid parameters such as Margin Reflex Distances (MRD1, MRD2) and Levator Function (LF) is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods. This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images. We assess performance across frozen and fine-tuned settings, using MSE, MAE, and R2 metrics. DINOv2, pretrained through self-supervised learning, demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment. Lightweight regressors such as MLP and Deep Ensemble offer high precision with minimal computational overhead. To address class imbalance and improve generalization, we integrate focal loss, orthogonal regularization, and binary encoding strategies. Our results show that DINOv2 combined with these enhancements delivers consistent, accurate predictions across all tasks, making it a strong candidate for real-world, mobile-friendly clinical applications. This work highlights the potential of foundation models in advancing AI-powered ophthalmic care.

Abstract (translated)

准确测量眼睑参数,如边缘反射距离(MRD1和MRD2)及提上睑肌功能(LF),在眼整形诊断中至关重要,但目前仍受限于手动且不一致的方法。本研究评估了几种深度学习模型:SE-ResNet、EfficientNet以及基于视觉变换器的DINOv2,以利用智能手机获取的眼部图像实现这些测量的自动化。我们通过均方误差(MSE)、平均绝对误差(MAE)和R²指标,在冻结和微调设置下评估了这些模型的表现。 预训练模型DINOv2通过自监督学习获得了优异的可扩展性和鲁棒性,尤其是在冻结条件下表现尤为突出,这使其非常适合移动设备部署。轻量级回归器如多层感知机(MLP)及深度集成方法提供了高精度的同时减少了计算开销。为了应对类别不平衡和提高泛化能力,我们整合了焦损、正交规则化以及二进制编码策略。 实验结果显示,结合这些改进后的DINOv2模型在所有任务中均能提供一致且精确的预测结果,使其成为实际临床应用中的移动友好型候选方案。这项研究强调了基础模型在推动AI驱动的眼科护理进步方面的巨大潜力。

URL

https://arxiv.org/abs/2504.00515

PDF

https://arxiv.org/pdf/2504.00515.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot