Paper Reading AI Learner

Appearance Debiased Gaze Estimation via Stochastic Subject-Wise Adversarial Learning

2024-01-25 00:23:21
Suneung Kim, Woo-Jeoung Nam, Seong-Whan Lee

Abstract

Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework: Stochastic subject-wise Adversarial gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. We design a Face generalization Network (Fgen-Net) using a face-to-gaze encoder and face identity classifier and a proposed adversarial loss. The proposed loss generalizes face appearance factors so that the identity classifier inferences a uniform probability distribution. In addition, the Fgen-Net is trained by a learning mechanism that optimizes the network by reselecting a subset of subjects at every training step to avoid overfitting. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively. Furthermore, we demonstrate the positive generalization effect by conducting further experiments using face images involving different styles generated from the generative model.

Abstract (translated)

近年来,基于外观的注意力检测在计算机视觉领域引起了关注,并使用各种深度学习技术取得了显著的改进。然而,大多数方法旨在通过直接从图像中推断目光向量来获取人物特定外观因素,导致过拟合到个性化的外观因素。在本文中,我们解决了这些挑战,并提出了一个新颖的框架:随机主题的对抗性全局姿态学习(SAZE),它训练一个网络来推广主题。我们使用 face-to-gaze 编码器和一个 face identity 分类器来设计 Fgen-Net。所提出的损失将人脸外观因素扩展为一个均匀的概率分布。此外,通过在每次训练步骤中重新选择部分主题来优化网络,以避免过拟合。我们对该方法进行了实验,结果表明,该方法具有鲁棒性,在 MPIIGaze 和 EyeDiap 数据集上分别取得了 3.89 和 4.42 的分数。此外,我们通过进一步实验使用包含不同风格的人脸图像来验证了积极泛化效果。

URL

https://arxiv.org/abs/2401.13865

PDF

https://arxiv.org/pdf/2401.13865.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot