Paper Reading AI Learner

Lifting 2d Human Pose to 3d : A Weakly Supervised Approach

2019-05-03 06:51:11
Sandika Biswas, Sanjana Sinha, Kavya Gupta, Brojeshwar Bhowmick

Abstract

Estimating 3d human pose from monocular images is a challenging problem due to the variety and complexity of human poses and the inherent ambiguity in recovering depth from the single view. Recent deep learning based methods show promising results by using supervised learning on 3d pose annotated datasets. However, the lack of large-scale 3d annotated training data captured under in-the-wild settings makes the 3d pose estimation difficult for in-the-wild poses. Few approaches have utilized training images from both 3d and 2d pose datasets in a weakly-supervised manner for learning 3d poses in unconstrained settings. In this paper, we propose a method which can effectively predict 3d human pose from 2d pose using a deep neural network trained in a weakly-supervised manner on a combination of ground-truth 3d pose and ground-truth 2d pose. Our method uses re-projection error minimization as a constraint to predict the 3d locations of body joints, and this is crucial for training on data where the 3d ground-truth is not present. Since minimizing re-projection error alone may not guarantee an accurate 3d pose, we also use additional geometric constraints on skeleton pose to regularize the pose in 3d. We demonstrate the superior generalization ability of our method by cross-dataset validation on a challenging 3d benchmark dataset MPI-INF-3DHP containing in the wild 3d poses.

Abstract (translated)

由于人体姿态的多样性和复杂性,以及单视图深度恢复中固有的模糊性,单目图像中的三维人体姿态估计是一个具有挑战性的问题。最近的基于深度学习的方法通过在三维姿势标注数据集上使用有监督的学习显示出了很好的效果。然而,由于缺乏在“野外设置”下捕捉到的大规模三维标注训练数据,使得野外姿势的三维姿势估计变得困难。很少有方法利用三维和二维姿势数据集中的训练图像,在不受约束的环境中以弱监督的方式学习三维姿势。本文提出了一种将地面真值三维姿态和地面真值二维姿态相结合,采用弱监督训练的深神经网络,从二维姿态有效预测三维人体姿态的方法。我们的方法使用重投影误差最小化作为约束来预测身体关节的三维位置,这对于在没有三维地面真实性的情况下训练数据至关重要。由于单靠最小的重投影误差可能无法保证精确的三维姿态,我们还利用骨架姿态的附加几何约束来规范三维姿态。我们通过对一个具有挑战性的三维基准数据集MPI-INF-3DHP(包含在野生三维姿态中)进行交叉数据集验证,证明了我们的方法具有良好的泛化能力。

URL

https://arxiv.org/abs/1905.01047

PDF

https://arxiv.org/pdf/1905.01047.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot