Paper Reading AI Learner

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

2018-09-17 02:28:49
Ting Liu, Tao Ruan, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, Thomas Huang

Abstract

Human parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still unclear how to develop an accurate human parsing system in an efficient and elegant way. In this paper, we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task. The advantages of these useful properties finally result in a simple yet effective Context Embedding with Edge Perceiving (CE2P) framework for single human parsing. Our CE2P is end-to-end trainable and can be easily adopted for conducting multiple human parsing. Benefiting the superiority of CE2P, we achieved the 1st places on all three human parsing benchmarks. Without any bells and whistles, we achieved 56.50\% (mIoU), 45.31\% (mean $AP^r$) and 33.34\% ($AP^p_{0.5}$) in LIP, CIHP and MHP v2.0, which outperform the state-of-the-arts more than 2.06\%, 3.81\% and 1.87\%, respectively. We hope our CE2P will serve as a solid baseline and help ease future research in single/multiple human parsing. Code has been made available at \url{https://github.com/liutinglt/CE2P}.

Abstract (translated)

由于其广泛的应用潜力,人类解析已经获得了相当大的兴趣。然而,目前尚不清楚如何以高效和优雅的方式开发出精确的人体解析系统。在本文中,我们确定了几个有用的属性,包括特征分辨率,全局上下文信息和边缘细节,并执行严格的分析以揭示如何利用它们来使人类解析任务受益。这些有用属性的优点最终导致一个简单而有效的Context Encedding with Edge Perceiving(CE2P)框架,用于单人解析。我们的CE2P是端到端的可训练的,可以很容易地用于进行多种人工解析。受益于CE2P的优势,我们在所有三个人类解析基准测试中取得了第一名。没有任何花里胡哨,我们在LIP,CIHP和MHP v2.0中达到56.50 \%(mIoU),45.31 \%(平均$ AP ^ r $)和33.34 \%($ AP ^ p_ {0.5} $),其表现优于现有技术,分别超过2.06%,3.81%和1.87%。我们希望我们的CE2P将成为一个坚实的基线,并有助于简化未来单/多人解析的研究。代码已在\ url {https://github.com/liutinglt/CE2P}上提供。

URL

https://arxiv.org/abs/1809.05996

PDF

https://arxiv.org/pdf/1809.05996.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot