Paper Reading AI Learner

Bias-Compensated Integral Regression for Human Pose Estimation

2023-01-25 06:54:04
Kerui Gu, Linlin Yang, Michael Bi Mi, Angela Yao

Abstract

In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncovers an induced bias from integral regression that results from combining the softmax and the expectation operation. This bias often forces the network to learn degenerately localized heatmaps, obscuring the keypoint's true underlying distribution and leads to lower accuracies. Training-wise, by investigating the gradients of integral regression, we show that the implicit guidance of integral regression to update the heatmap makes it slower to converge than detection. To counter the above two limitations, we propose Bias Compensated Integral Regression (BCIR), an integral regression-based framework that compensates for the bias. BCIR also incorporates a Gaussian prior loss to speed up training and improve prediction accuracy. Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.

Abstract (translated)

在人类和手姿态估计中,热图是对身体或手部关键点的重要中间表示。两种常见的方法如何将热图解码为最终关节坐标系的方法是通过argmax,就像热图检测中使用的,或者通过积分回归,就像整体回归中使用的。整体回归可以整体学习,但比检测更准确。本文揭示了整体回归中的诱导偏差,这源于将softmax和期望操作相结合。这常常迫使网络学习退化局部化的热图,掩盖关键点的真实 underlying 分布,导致更准确的偏差。在训练方面,通过研究整体回归梯度,我们表明,整体回归的更新热图的隐含指导比检测更慢收敛。为了对抗上述两个限制,我们提出了偏差补偿整体回归(BCIR),这是一个整体回归基于框架,补偿了偏差。BCIR还引入Gaussian先前损失,加快训练并提高预测精度。对人类身体和手部基准的实验结果显示,BCIR比原始整体回归更快地训练且更准确,使其与最先进的检测方法竞争。

URL

https://arxiv.org/abs/2301.10431

PDF

https://arxiv.org/pdf/2301.10431.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot