Paper Reading AI Learner

Out of the Box: A combined approach for handling occlusion in Human Pose Estimation

2019-04-25 05:10:18
Rohit Jena

Abstract

Human Pose estimation is a challenging problem, especially in the case of 3D pose estimation from 2D images due to many different factors like occlusion, depth ambiguities, intertwining of people, and in general crowds. 2D multi-person human pose estimation in the wild also suffers from the same problems - occlusion, ambiguities, and disentanglement of people's body parts. Being a fundamental problem with loads of applications, including but not limited to surveillance, economical motion capture for video games and movies, and physiotherapy, this is an interesting problem to be solved both from a practical perspective and from an intellectual perspective as well. Although there are cases where no pose estimation can ever predict with 100% accuracy (cases where even humans would fail), there are several algorithms that have brought new state-of-the-art performance in human pose estimation in the wild. We look at a few algorithms with different approaches and also formulate our own approach to tackle a consistently bugging problem, i.e. occlusions.

Abstract (translated)

人体姿态估计是一个具有挑战性的问题,尤其是在从二维图像进行三维姿态估计的情况下,由于许多不同的因素,如遮挡、深度模糊、人与人之间的相互缠绕以及一般人群。在野外,二维多人人体姿势估计也面临着同样的问题:人体部位的遮挡、模糊和分离。这是一个非常重要的应用问题,包括但不限于监控、视频游戏和电影的经济动作捕捉以及物理治疗,从实践和智力的角度来看,这是一个需要解决的有趣问题。尽管有些情况下,没有姿势估计能够100%准确地预测(即使是人类也会失败),但有几种算法已经在野外为人体姿势估计带来了最先进的性能。我们研究了一些使用不同方法的算法,并制定了自己的方法来解决一个持续不断的窃听问题,即阻塞。

URL

https://arxiv.org/abs/1904.11157

PDF

https://arxiv.org/pdf/1904.11157.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot