Paper Reading AI Learner

UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation

2023-12-25 08:13:28
Ruicong Liu, Feng Lu

Abstract

Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have two limitations. 1) They require multi-view annotations for training, which are expensive. 2) More importantly, during testing, the exact positions of the multiple cameras must be known and match those used in training, which limits the application scenario. To address these challenges, we propose a novel 1-view-to-2-views (1-to-2 views) adaptation solution in this paper, the Unsupervised 1-to-2 Views Adaptation framework for Gaze estimation (UVAGaze). Our method adapts a traditional single-view gaze estimator for flexibly placed dual cameras. Here, the "flexibly" means we place the dual cameras in arbitrary places regardless of the training data, without knowing their extrinsic parameters. Specifically, the UVAGaze builds a dual-view mutual supervision adaptation strategy, which takes advantage of the intrinsic consistency of gaze directions between both views. In this way, our method can not only benefit from common single-view pre-training, but also achieve more advanced dual-view gaze estimation. The experimental results show that a single-view estimator, when adapted for dual views, can achieve much higher accuracy, especially in cross-dataset settings, with a substantial improvement of 47.0%. Project page: this https URL.

Abstract (translated)

目光估计已经成为最近研究的一个热门主题。大多数现有方法依赖于单视图面部图像作为输入。然而,对于这些方法来说处理大的头角度是困难的,导致估计精度存在潜在误差。为了应对这个问题,添加第二个视角的相机可以帮助更好地捕捉眼部特征。然而,现有的多视图方法有两个限制。1)它们需要多视图注释来进行训练,这需要花费大量的资金。2)更重要的是,在测试时,多个相机的精确位置必须知道并与其训练时使用的位置相匹配,这限制了应用场景。为了应对这些挑战,本文提出了一种新颖的1-视-2-视(1-到2视)适应解决方案,即无监督的1-到2视适应框架(UVAGaze)。我们的方法将传统的单视目光估计算法适应于灵活放置的双相机。这里的"灵活"意味着我们将双相机在任何位置放置,而不知道它们的非线性参数。具体来说,UVAGaze构建了一种双视 mutual supervision adaptation strategy,利用了两视之间目光方向的固有一致性。这样,我们的方法不仅可以从常见的单视预训练中受益,还可以实现更高级的双视目光估计。实验结果表明,将目光估计算法适应双视可以实现更高的准确度,尤其是在跨数据集设置中,其准确度提高了47.0%。项目页面:此链接。

URL

https://arxiv.org/abs/2312.15644

PDF

https://arxiv.org/pdf/2312.15644.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot