Abstract
Estimating 3d human pose from monocular images is a challenging problem due to the variety and complexity of human poses and the inherent ambiguity in recovering depth from the single view. Recent deep learning based methods show promising results by using supervised learning on 3d pose annotated datasets. However, the lack of large-scale 3d annotated training data captured under in-the-wild settings makes the 3d pose estimation difficult for in-the-wild poses. Few approaches have utilized training images from both 3d and 2d pose datasets in a weakly-supervised manner for learning 3d poses in unconstrained settings. In this paper, we propose a method which can effectively predict 3d human pose from 2d pose using a deep neural network trained in a weakly-supervised manner on a combination of ground-truth 3d pose and ground-truth 2d pose. Our method uses re-projection error minimization as a constraint to predict the 3d locations of body joints, and this is crucial for training on data where the 3d ground-truth is not present. Since minimizing re-projection error alone may not guarantee an accurate 3d pose, we also use additional geometric constraints on skeleton pose to regularize the pose in 3d. We demonstrate the superior generalization ability of our method by cross-dataset validation on a challenging 3d benchmark dataset MPI-INF-3DHP containing in the wild 3d poses.
Abstract (translated)
由于人体姿态的多样性和复杂性,以及单视图深度恢复中固有的模糊性,单目图像中的三维人体姿态估计是一个具有挑战性的问题。最近的基于深度学习的方法通过在三维姿势标注数据集上使用有监督的学习显示出了很好的效果。然而,由于缺乏在“野外设置”下捕捉到的大规模三维标注训练数据,使得野外姿势的三维姿势估计变得困难。很少有方法利用三维和二维姿势数据集中的训练图像,在不受约束的环境中以弱监督的方式学习三维姿势。本文提出了一种将地面真值三维姿态和地面真值二维姿态相结合,采用弱监督训练的深神经网络,从二维姿态有效预测三维人体姿态的方法。我们的方法使用重投影误差最小化作为约束来预测身体关节的三维位置,这对于在没有三维地面真实性的情况下训练数据至关重要。由于单靠最小的重投影误差可能无法保证精确的三维姿态,我们还利用骨架姿态的附加几何约束来规范三维姿态。我们通过对一个具有挑战性的三维基准数据集MPI-INF-3DHP(包含在野生三维姿态中)进行交叉数据集验证,证明了我们的方法具有良好的泛化能力。
URL
https://arxiv.org/abs/1905.01047