Abstract
Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework: Stochastic subject-wise Adversarial gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. We design a Face generalization Network (Fgen-Net) using a face-to-gaze encoder and face identity classifier and a proposed adversarial loss. The proposed loss generalizes face appearance factors so that the identity classifier inferences a uniform probability distribution. In addition, the Fgen-Net is trained by a learning mechanism that optimizes the network by reselecting a subset of subjects at every training step to avoid overfitting. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively. Furthermore, we demonstrate the positive generalization effect by conducting further experiments using face images involving different styles generated from the generative model.
Abstract (translated)
近年来,基于外观的注意力检测在计算机视觉领域引起了关注,并使用各种深度学习技术取得了显著的改进。然而,大多数方法旨在通过直接从图像中推断目光向量来获取人物特定外观因素,导致过拟合到个性化的外观因素。在本文中,我们解决了这些挑战,并提出了一个新颖的框架:随机主题的对抗性全局姿态学习(SAZE),它训练一个网络来推广主题。我们使用 face-to-gaze 编码器和一个 face identity 分类器来设计 Fgen-Net。所提出的损失将人脸外观因素扩展为一个均匀的概率分布。此外,通过在每次训练步骤中重新选择部分主题来优化网络,以避免过拟合。我们对该方法进行了实验,结果表明,该方法具有鲁棒性,在 MPIIGaze 和 EyeDiap 数据集上分别取得了 3.89 和 4.42 的分数。此外,我们通过进一步实验使用包含不同风格的人脸图像来验证了积极泛化效果。
URL
https://arxiv.org/abs/2401.13865