Abstract
Head pose estimation has become a crucial area of research in computer vision given its usefulness in a wide range of applications, including robotics, surveillance, or driver attention monitoring. One of the most difficult challenges in this field is managing head occlusions that frequently take place in real-world scenarios. In this paper, we propose a novel and efficient framework that is robust in real world head occlusion scenarios. In particular, we propose an unsupervised latent embedding clustering with regression and classification components for each pose angle. The model optimizes latent feature representations for occluded and non-occluded images through a clustering term while improving fine-grained angle predictions. Experimental evaluation on in-the-wild head pose benchmark datasets reveal competitive performance in comparison to state-of-the-art methodologies with the advantage of having a significant data reduction. We observe a substantial improvement in occluded head pose estimation. Also, an ablation study is conducted to ascertain the impact of the clustering term within our proposed framework.
Abstract (translated)
头姿态估计已成为计算机视觉领域的一个重要研究课题,因为在广泛的应用领域中具有实用性,包括机器人、监视或驾驶员注意力监测。这个领域中最困难的一个挑战是处理在现实场景中经常发生的头部遮挡。在本文中,我们提出了一个新颖且高效的框架,对真实世界头部遮挡场景具有鲁棒性。特别是,我们提出了一个无监督的潜在嵌入聚类与回归和分类组件,用于每个姿态角度。通过聚类项优化遮挡和非遮挡图像的潜在特征表示,同时改善微小角度预测。在野外头部姿态基准数据集上的实验评估显示,与最先进的methodologies相比具有竞争性的性能,同时数据量减少显著。我们观察到,在提出的框架中,遮挡头部姿态估计取得了很大的改善。此外,我们还进行了一项消融研究,以确定在我们的框架中聚类项的影响。
URL
https://arxiv.org/abs/2403.20251