Abstract
In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncovers an induced bias from integral regression that results from combining the softmax and the expectation operation. This bias often forces the network to learn degenerately localized heatmaps, obscuring the keypoint's true underlying distribution and leads to lower accuracies. Training-wise, by investigating the gradients of integral regression, we show that the implicit guidance of integral regression to update the heatmap makes it slower to converge than detection. To counter the above two limitations, we propose Bias Compensated Integral Regression (BCIR), an integral regression-based framework that compensates for the bias. BCIR also incorporates a Gaussian prior loss to speed up training and improve prediction accuracy. Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.
Abstract (translated)
在人类和手姿态估计中,热图是对身体或手部关键点的重要中间表示。两种常见的方法如何将热图解码为最终关节坐标系的方法是通过argmax,就像热图检测中使用的,或者通过积分回归,就像整体回归中使用的。整体回归可以整体学习,但比检测更准确。本文揭示了整体回归中的诱导偏差,这源于将softmax和期望操作相结合。这常常迫使网络学习退化局部化的热图,掩盖关键点的真实 underlying 分布,导致更准确的偏差。在训练方面,通过研究整体回归梯度,我们表明,整体回归的更新热图的隐含指导比检测更慢收敛。为了对抗上述两个限制,我们提出了偏差补偿整体回归(BCIR),这是一个整体回归基于框架,补偿了偏差。BCIR还引入Gaussian先前损失,加快训练并提高预测精度。对人类身体和手部基准的实验结果显示,BCIR比原始整体回归更快地训练且更准确,使其与最先进的检测方法竞争。
URL
https://arxiv.org/abs/2301.10431