Abstract
Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.
Abstract (translated)
仅从自然眼睛图像估计人类凝视是一项具有挑战性的任务。注视方向可以由瞳孔和眼球中心定义,其中后者在2D图像中是不可观察的。因此,实现高度准确的凝视估计是一个不适定的问题。在本文中,我们介绍了一种新颖的深度神经网络架构,专门设计用于单眼输入的注视估计任务。我们回归到中间图形表示,而不是直接回归眼球的俯仰和偏转的两个角度,这反过来简化了3D注视方向估计的任务。我们的定量和定性结果表明,我们的方法实现了比现有技术更高的精度,并且对于凝视,头部姿势和图像质量的变化具有鲁棒性。
URL
https://arxiv.org/abs/1807.10002