Paper Reading AI Learner

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

2023-05-25 15:15:03
Jiawei Qin, Takuru Shimoyama, Xucong Zhang, Yusuke Sugano

Abstract

Along with the recent development of deep neural networks, appearance-based gaze estimation has succeeded considerably when training and testing within the same domain. Compared to the within-domain task, the variance of different domains makes the cross-domain performance drop severely, preventing gaze estimation deployment in real-world applications. Among all the factors, ranges of head pose and gaze are believed to play a significant role in the final performance of gaze estimation, while collecting large ranges of data is expensive. This work proposes an effective model training pipeline consisting of a training data synthesis and a gaze estimation model for unsupervised domain adaptation. The proposed data synthesis leverages the single-image 3D reconstruction to expand the range of the head poses from the source domain without requiring a 3D facial shape dataset. To bridge the inevitable gap between synthetic and real images, we further propose an unsupervised domain adaptation method suitable for synthetic full-face data. We propose a disentangling autoencoder network to separate gaze-related features and introduce background augmentation consistency loss to utilize the characteristics of the synthetic source domain. Through comprehensive experiments, we show that the model only using monocular-reconstructed synthetic training data can perform comparably to real data with a large label range. Our proposed domain adaptation approach further improves the performance on multiple target domains. The code and data will be available at \url{this https URL}.

Abstract (translated)

与深度学习网络的最新发展相伴,基于外观的 gaze 估计在同域内的训练和测试中取得了显著成功。相对于同域任务,不同域的差异使跨域性能严重下降,从而阻止 gaze 估计在现实世界中的应用。在所有因素中,头部姿态和注视范围被认为是 gaze 估计最终性能的重要影响因素,而收集大量数据的成本很高。该工作提出了一种有效的模型训练 pipeline,包括一个训练数据合成和 gaze 估计模型的无监督跨域适应方法。该合成方法利用单图像三维重构扩大源域头部姿态范围,而不需要三维面部形状数据集。为了弥补合成和真实图像之间的必然差距,我们进一步提出了适合合成全貌数据的无监督跨域适应方法。我们提出了一个分离注意力相关的特征的解码网络,并引入背景增强一致性损失,利用合成源域的特点。通过综合实验,我们表明,仅使用单眼重构的合成训练数据可以使用与大量标签范围的真实数据相当的性能。我们提出的跨域适应方法进一步改进了多个目标域的性能。代码和数据将可在 \url{this https URL} 上获取。

URL

https://arxiv.org/abs/2305.16140

PDF

https://arxiv.org/pdf/2305.16140.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot