Paper Reading AI Learner

Latent Embedding Clustering for Occlusion Robust Head Pose Estimation

2024-03-29 15:57:38
Jos\'e Celestino, Manuel Marques, Jacinto C. Nascimento

Abstract

Head pose estimation has become a crucial area of research in computer vision given its usefulness in a wide range of applications, including robotics, surveillance, or driver attention monitoring. One of the most difficult challenges in this field is managing head occlusions that frequently take place in real-world scenarios. In this paper, we propose a novel and efficient framework that is robust in real world head occlusion scenarios. In particular, we propose an unsupervised latent embedding clustering with regression and classification components for each pose angle. The model optimizes latent feature representations for occluded and non-occluded images through a clustering term while improving fine-grained angle predictions. Experimental evaluation on in-the-wild head pose benchmark datasets reveal competitive performance in comparison to state-of-the-art methodologies with the advantage of having a significant data reduction. We observe a substantial improvement in occluded head pose estimation. Also, an ablation study is conducted to ascertain the impact of the clustering term within our proposed framework.

Abstract (translated)

头姿态估计已成为计算机视觉领域的一个重要研究课题,因为在广泛的应用领域中具有实用性,包括机器人、监视或驾驶员注意力监测。这个领域中最困难的一个挑战是处理在现实场景中经常发生的头部遮挡。在本文中,我们提出了一个新颖且高效的框架,对真实世界头部遮挡场景具有鲁棒性。特别是,我们提出了一个无监督的潜在嵌入聚类与回归和分类组件,用于每个姿态角度。通过聚类项优化遮挡和非遮挡图像的潜在特征表示,同时改善微小角度预测。在野外头部姿态基准数据集上的实验评估显示,与最先进的methodologies相比具有竞争性的性能,同时数据量减少显著。我们观察到,在提出的框架中,遮挡头部姿态估计取得了很大的改善。此外,我们还进行了一项消融研究,以确定在我们的框架中聚类项的影响。

URL

https://arxiv.org/abs/2403.20251

PDF

https://arxiv.org/pdf/2403.20251.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot