Paper Reading AI Learner

Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization

2024-05-02 16:26:37
Ruijie Zhao, Pinyan Tang, Sihui Luo

Abstract

Despite remarkable advancements, mainstream gaze estimation techniques, particularly appearance-based methods, often suffer from performance degradation in uncontrolled environments due to variations in illumination and individual facial attributes. Existing domain adaptation strategies, limited by their need for target domain samples, may fall short in real-world applications. This letter introduces Branch-out Auxiliary Regularization (BAR), an innovative method designed to boost gaze estimation's generalization capabilities without requiring direct access to target domain data. Specifically, BAR integrates two auxiliary consistency regularization branches: one that uses augmented samples to counteract environmental variations, and another that aligns gaze directions with positive source domain samples to encourage the learning of consistent gaze features. These auxiliary pathways strengthen the core network and are integrated in a smooth, plug-and-play manner, facilitating easy adaptation to various other models. Comprehensive experimental evaluations on four cross-dataset tasks demonstrate the superiority of our approach.

Abstract (translated)

尽管在可见的进步中,主流的视差估计技术(特别是以外观为基础的方法)在未受控的环境中往往性能下降,因为照明和个体面部属性的变化会导致性能下降。现有的领域自适应策略,由于需要目标领域样本,可能在其现实应用中不够有效。本文介绍了一种名为Branch-out Auxiliary Regularization(BAR)的创新方法,旨在提高视差估计的泛化能力,而无需直接访问目标领域数据。具体来说,BAR结合了两个辅助一致性正则化分支:一个使用增强样本来对抗环境变化,另一个将目光方向与积极源域样本对齐,以促进学习一致的视差特征。这些辅助通道加强了核心网络,以一种平滑、可插拔的方式集成,便于轻松适应各种其他模型。在四个跨数据集任务的综合实验评估中,证明了我们的方法具有优越性。

URL

https://arxiv.org/abs/2405.01439

PDF

https://arxiv.org/pdf/2405.01439.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot