Paper Reading AI Learner

Unsupervised 3D Pose Estimation with Geometric Self-Supervision

2019-04-09 17:53:50
Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg

Abstract

We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self-consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses "in the wild", we train an unsupervised 2D domain adapter network to allow for an expansion of 2D data. This improves results and demonstrates the usefulness of 2D pose data for unsupervised 3D lifting. Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach improves upon the previous unsupervised methods by 30% and outperforms many weakly supervised approaches that explicitly use 3D data.

Abstract (translated)

我们提出了一种无监督学习方法来恢复三维人体姿势从二维骨骼关节提取的单一图像。我们的方法不需要任何多视图图像数据、三维骨架、二维到三维点之间的对应关系,也不需要在培训期间使用以前学习过的三维优先级。提升网络接受二维地标作为输入,并生成相应的三维骨架估计。在训练过程中,恢复的三维骨架被重新投射到随机相机视点上,以生成新的“合成”二维姿势。通过将合成的二维姿势提升回三维并在原始相机视图中重新投影,我们可以在三维和二维中定义自我一致性损失。因此,利用提升重投影提升过程的几何自一致性,可以对训练进行自我监督。我们证明,仅仅自我一致性不足以生成真实的骨骼,但是添加一个二维姿态鉴别器可以使升降机输出有效的三维姿态。此外,为了“在野外”学习二维姿势,我们训练了一个无监督的二维域适配器网络,以允许二维数据的扩展。这提高了结果,并证明了二维姿态数据在无监督三维提升中的实用性。三维人体姿态估计的人类360万数据集的结果表明,我们的方法比以前的无监督方法提高了30%,优于许多明确使用三维数据的弱监督方法。

URL

https://arxiv.org/abs/1904.04812

PDF

https://arxiv.org/pdf/1904.04812.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot