Paper Reading AI Learner

Towards Precision in Appearance-based Gaze Estimation in the Wild

2023-02-05 10:09:35
Murthy L.R.D., Abhishek Mukhopadhyay, Shambhavi Aggarwal, Ketan Anand, Pradipta Biswas

Abstract

Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little attention so far towards precision evaluations of existing gaze estimation approaches. In this work, we present a large gaze estimation dataset, PARKS-Gaze, with wider head pose and illumination variation and with multiple samples for a single Point of Gaze (PoG). The dataset contains 974 minutes of data from 28 participants with a head pose range of 60 degrees in both yaw and pitch directions. Our within-dataset and cross-dataset evaluations and precision evaluations indicate that the proposed dataset is more challenging and enable models to generalize on unseen participants better than the existing in-the-wild datasets. The project page can be accessed here: this https URL

Abstract (translated)

appearance-based gaze estimation systems 在最近取得了巨大的进展,但这些技巧的性能取决于用于训练的 datasets。大多数现有的 gaze estimation dataset 在互动设置中设置是在实验室条件下录制的,而在野生条件下录制的dataset 显示 head pose 和照明变化非常有限。此外,我们观察到迄今为止 little 关注于 precision 评估 existing gaze estimation approaches。在本研究中,我们提出了一个大型 gaze estimation dataset,PARKS-Gaze,具有更广泛的 head pose 和照明变化,并且具有多个样本,以单点 gaze (PoG)。dataset 包含 974 分钟的数据,来自 28 名参与者,其 head pose 范围在 yaw 和 pitch 方向上为 60 度。我们的dataset 内部和跨dataset 评估以及 precision 评估表明, proposed dataset 更具挑战性,使模型能够对未观测到的参与者更普遍地泛化。项目页面可访问此处:这个 https URL。

URL

https://arxiv.org/abs/2302.02353

PDF

https://arxiv.org/pdf/2302.02353.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot