Paper Reading AI Learner

Semi-supervised Contrastive Regression for Estimation of Eye Gaze

2023-08-05 04:11:38
Somsukla Maiti, Akshansh Gupta

Abstract

With the escalated demand of human-machine interfaces for intelligent systems, development of gaze controlled system have become a necessity. Gaze, being the non-intrusive form of human interaction, is one of the best suited approach. Appearance based deep learning models are the most widely used for gaze estimation. But the performance of these models is entirely influenced by the size of labeled gaze dataset and in effect affects generalization in performance. This paper aims to develop a semi-supervised contrastive learning framework for estimation of gaze direction. With a small labeled gaze dataset, the framework is able to find a generalized solution even for unseen face images. In this paper, we have proposed a new contrastive loss paradigm that maximizes the similarity agreement between similar images and at the same time reduces the redundancy in embedding representations. Our contrastive regression framework shows good performance in comparison to several state of the art contrastive learning techniques used for gaze estimation.

Abstract (translated)

随着对智能系统的人机交互需求不断增加, gaze控制系统已经成为必要的开发方向。 gaze 是人类交互的非侵入形式,因此是一种非常适合的方法。基于外观的深度学习模型是 gaze 估计最常用的方法之一。但这些模型的性能完全取决于标记的 gaze 数据集的大小,并且实际上会影响其性能的泛化能力。本文旨在开发一个半监督的对比学习框架,用于估计 gaze 方向。只要有少量的标记 gaze 数据集,框架就能够为 unseen 的面容图像找到通用的解决方案。本文提出了一种新的对比损失范式,该范式最大化相似图像之间的相似度一致性,同时减少嵌入表示中的冗余。我们的对比回归框架相对于用于 gaze 估计的一些最先进的对比学习技术表现出良好的性能。

URL

https://arxiv.org/abs/2308.02784

PDF

https://arxiv.org/pdf/2308.02784.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot