Abstract
We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self-consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses "in the wild", we train an unsupervised 2D domain adapter network to allow for an expansion of 2D data. This improves results and demonstrates the usefulness of 2D pose data for unsupervised 3D lifting. Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach improves upon the previous unsupervised methods by 30% and outperforms many weakly supervised approaches that explicitly use 3D data.
Abstract (translated)
我们提出了一种无监督学习方法来恢复三维人体姿势从二维骨骼关节提取的单一图像。我们的方法不需要任何多视图图像数据、三维骨架、二维到三维点之间的对应关系,也不需要在培训期间使用以前学习过的三维优先级。提升网络接受二维地标作为输入,并生成相应的三维骨架估计。在训练过程中,恢复的三维骨架被重新投射到随机相机视点上,以生成新的“合成”二维姿势。通过将合成的二维姿势提升回三维并在原始相机视图中重新投影,我们可以在三维和二维中定义自我一致性损失。因此,利用提升重投影提升过程的几何自一致性,可以对训练进行自我监督。我们证明,仅仅自我一致性不足以生成真实的骨骼,但是添加一个二维姿态鉴别器可以使升降机输出有效的三维姿态。此外,为了“在野外”学习二维姿势,我们训练了一个无监督的二维域适配器网络,以允许二维数据的扩展。这提高了结果,并证明了二维姿态数据在无监督三维提升中的实用性。三维人体姿态估计的人类360万数据集的结果表明,我们的方法比以前的无监督方法提高了30%,优于许多明确使用三维数据的弱监督方法。
URL
https://arxiv.org/abs/1904.04812