Paper Reading AI Learner

A Differential Approach for Gaze Estimation

2019-04-20 15:17:45
Gang Liu, Yu Yu, Kenneth A. Funes Mora, Jean-Marc Odobez

Abstract

Non-invasive gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her actual gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a differential convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbations) which usually plague single image prediction methods can be much reduced, allowing better prediction altogether. Experiments on 3 public datasets validate our approach which constantly outperforms state-of-the-art methods even when using only one calibration sample or when the latter methods are followed by subject specific gaze adaptation.

Abstract (translated)

非侵入性注视估计方法通常直接从一张脸或眼睛图像中回归注视方向。然而,由于个体的眼睛形状和内部眼睛结构的重要变化,通用模型获得的精度有限,其输出通常表现出高方差以及受个体影响的偏差。因此,提高准确度通常是通过校准来完成的,这样可以将一个对象的注视预测映射到他/她的实际注视。本文介绍了一种新的视觉估计的图像差分方法。我们建议直接训练一个差分卷积神经网络来预测同一受试者的两个眼睛输入图像之间的注视差异。然后,在给定一组特定于被摄对象的校准图像的情况下,我们可以利用推断出的差异来预测一个新的眼睛样本的注视方向。假设通过比较两个眼睛图像,可以大大减少困扰单图像预测方法的烦扰因素(对齐、眼睑闭合、照明干扰),从而实现更好的预测。对3个公共数据集进行的实验验证了我们的方法,即使只使用一个校准样本,或在后一种方法之后再进行特定于受试者的注视适应时,我们的方法始终优于最先进的方法。

URL

https://arxiv.org/abs/1904.09459

PDF

https://arxiv.org/pdf/1904.09459.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot