Paper Reading AI Learner

Mathematical Foundation and Corrections for Full Range Head Pose Estimation

2024-03-26 21:04:18
Huei-Chung Hu, Xuyang Wu, Yuan Wang, Yi Fang, Hsin-Tai Wu

Abstract

Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on coordinate systems, and yaw, roll, and pitch angles are sensitive to their application order. Without precise definitions, it becomes challenging to validate the correctness of the output head pose and drawing routines employed in prior works. In this paper, we thoroughly examined the Euler angles defined in the 300W-LP dataset, head pose estimation such as 3DDFA-v2, 6D-RepNet, WHENet, etc, and the validity of their drawing routines of the Euler angles. When necessary, we infer their coordinate system and sequence of yaw, roll, pitch from provided code. This paper presents (1) code and algorithms for inferring coordinate system from provided source code, code for Euler angle application order and extracting precise rotation matrices and the Euler angles, (2) code and algorithms for converting poses from one rotation system to another, (3) novel formulae for 2D augmentations of the rotation matrices, and (4) derivations and code for the correct drawing routines for rotation matrices and poses. This paper also addresses the feasibility of defining rotations with right-handed coordinate system in Wikipedia and SciPy, which makes the Euler angle extraction much easier for full-range head pose research.

Abstract (translated)

许多关于头姿态估计(HPE)的作品提供了算法或基于神经网络的提取欧拉角度的方法,其中许多作品没有明确定义使用时的坐标系和欧拉或泰特-布莱尼安角度的顺序。众所周知,旋转矩阵取决于坐标系,而俯仰、滚转和偏航角度对应用顺序非常敏感。如果没有精确的定义,则很难验证之前工作中使用的输出头姿和绘制算法的正确性。在本文中,我们对300W-LP数据集中的欧拉角度进行了深入研究,包括3DDFA-v2、6D-RepNet、WHENet等头姿态估计方法,以及它们提取欧拉角度的绘制算法的验证。必要时,我们从提供的代码中推断它们的坐标系和俯仰、滚转、偏航的序列。本文提出了以下内容:(1)从提供源代码中推断坐标系和欧拉角度的代码和算法;(2)将姿态从一个旋转系统中转换到另一个旋转系统中的代码和算法;(3)用于旋转矩阵的2D增强公式;(4)用于绘制欧拉角度和姿态的准确绘制算法的推导和代码。本文还讨论了在维基百科和SciPy中使用右手法则定义旋转的可行性,这使得全范围头姿研究中的欧拉角度提取变得容易得多。

URL

https://arxiv.org/abs/2403.18104

PDF

https://arxiv.org/pdf/2403.18104.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot