Paper Reading AI Learner

LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking using Synthetic Eye Images

2023-09-12 11:08:14
sean anthony byrne, virmarie maquiling, marcus nyström, enkelejda kasneci, diederick c. niehorster

Abstract

Deep learning has bolstered gaze estimation techniques, but real-world deployment has been impeded by inadequate training datasets. This problem is exacerbated by both hardware-induced variations in eye images and inherent biological differences across the recorded participants, leading to both feature and pixel-level variance that hinders the generalizability of models trained on specific datasets. While synthetic datasets can be a solution, their creation is both time and resource-intensive. To address this problem, we present a framework called Light Eyes or "LEyes" which, unlike conventional photorealistic methods, only models key image features required for video-based eye tracking using simple light distributions. LEyes facilitates easy configuration for training neural networks across diverse gaze-estimation tasks. We demonstrate that models trained using LEyes outperform other state-of-the-art algorithms in terms of pupil and CR localization across well-known datasets. In addition, a LEyes trained model outperforms the industry standard eye tracker using significantly more cost-effective hardware. Going forward, we are confident that LEyes will revolutionize synthetic data generation for gaze estimation models, and lead to significant improvements of the next generation video-based eye trackers.

Abstract (translated)

深度学习已经加强了视点估计技术,但在实际部署方面,缺乏训练数据的问题一直阻碍着进展。这个问题由硬件引起的眼睛图像变异以及记录参与者之间的固有生物学差异加剧,导致特征和像素级别的差异,从而妨碍了特定数据集训练模型的泛化能力。虽然合成数据可以是一种解决方案,但创建它们既需要时间又需要资源。为了解决这一问题,我们提出了名为“光眼”或“LEyes”的框架,它与传统的逼真方法不同,仅使用简单的光照分布来模型关键图像特征,以用于视频跟踪的眼睛定位。LEyes方便地配置用于训练多种视点估计任务神经网络。我们证明,使用LEyes训练的模型在著名的数据集上的 pupils 和 CR 定位方面优于其他最先进的算法。此外,使用更经济实惠的硬件训练的LEyes模型比行业标准眼跟踪器表现更好。未来,我们有信心,LEyes将彻底改变合成数据生成用于视点估计模型,并导致下一代视频式眼跟踪器的重大改进。

URL

https://arxiv.org/abs/2309.06129

PDF

https://arxiv.org/pdf/2309.06129.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot