Abstract
Gaze tracking is an important technology in many domains. Techniques such as Convolutional Neural Networks (CNN) has allowed the invention of gaze tracking method that relies only on commodity hardware such as the camera on a personal computer. It has been shown that the full-face region for gaze estimation can provide better performance than from an eye image alone. However, a problem with using the full-face image is the heavy computation due to the larger image size. This study tackles this problem through compression of the input full-face image by removing redundant information using a novel learnable pooling module. The module can be trained end-to-end by backpropagation to learn the size of the grid in the pooling filter. The learnable pooling module keeps the resolution of valuable regions high and vice versa. This proposed method preserved the gaze estimation accuracy at a certain level when the image was reduced to a smaller size.
Abstract (translated)
凝视跟踪是许多领域的一项重要技术。卷积神经网络(CNN)等技术使得视线跟踪方法的发明只依赖于商品硬件,如个人电脑上的摄像机。研究表明,用于凝视估计的全脸区域可以提供比单眼图像更好的性能。然而,由于图像尺寸较大,使用全脸图像的一个问题是计算量大。本研究利用一个新颖的可学习合用模组移除多余资讯,以压缩输入全脸影像来解决这个问题。模块可以通过反向传播进行端到端的培训,以了解池过滤器中网格的大小。可学习池模块使有价值区域的分辨率保持较高,反之亦然。当图像缩小到一定尺寸时,该方法在一定程度上保持了凝视估计的精度。
URL
https://arxiv.org/abs/1903.05761