Paper Reading AI Learner

Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis

2019-04-24 04:38:30
Yu Yu, Gang Liu, Jean-Marc Odobez

Abstract

As an indicator of human attention gaze is a subtle behavioral cue which can be exploited in many applications. However, inferring 3D gaze direction is challenging even for deep neural networks given the lack of large amount of data (groundtruthing gaze is expensive and existing datasets use different setups) and the inherent presence of gaze biases due to person-specific difference. In this work, we address the problem of person-specific gaze model adaptation from only a few reference training samples. The main and novel idea is to improve gaze adaptation by generating additional training samples through the synthesis of gaze-redirected eye images from existing reference samples. In doing so, our contributions are threefold: (i) we design our gaze redirection framework from synthetic data, allowing us to benefit from aligned training sample pairs to predict accurate inverse mapping fields; (ii) we proposed a self-supervised approach for domain adaptation; (iii) we exploit the gaze redirection to improve the performance of person-specific gaze estimation. Extensive experiments on two public datasets demonstrate the validity of our gaze retargeting and gaze estimation framework.

Abstract (translated)

注视作为人类注意力的一种指标,是一种微妙的行为线索,可以在许多应用中加以利用。然而,即使对于深度神经网络来说,由于缺乏大量的数据(事实上的注视是昂贵的,现有的数据集使用不同的设置),以及由于人的特定差异而导致的注视偏差的固有存在,推断3D注视方向也是一个挑战。在这项工作中,我们只从一些参考训练样本中解决了特定于人的注视模式适应问题。主要和新颖的想法是通过合成现有参考样本中的凝视定向眼图像来生成额外的训练样本,从而提高注视适应能力。在这方面,我们的贡献有三个方面:(i)我们从合成数据中设计了我们的凝视重定向框架,使我们能够从对齐的训练样本对中获益,从而预测准确的反向映射字段;(ii)我们提出了一种域适应的自监督方法;(iii)我们利用注视重定向来提高特定于人的注视估计。对两个公共数据集进行了大量的实验,证明了我们的注视重定目标和注视估计框架的有效性。

URL

https://arxiv.org/abs/1904.10638

PDF

https://arxiv.org/pdf/1904.10638.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot