Paper Reading AI Learner

A Modular Multi-stage Lightweight Graph Transformer Network for Human Pose and Shape Estimation from 2D Human Pose

2023-01-31 04:42:47
Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi

Abstract

In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency. These methods typically prioritize accuracy, resulting in large network sizes and excessive computational complexity, which may hinder their practical application in real-world scenarios, such as virtual reality systems. To address this issue, we introduce a modular multi-stage lightweight graph-based transformer network for human pose and shape estimation from 2D human pose, a pose-based human mesh reconstruction approach that prioritizes computational efficiency without sacrificing reconstruction accuracy. Our method consists of a 2D-to-3D lifter module that utilizes graph transformers to analyze structured and implicit joint correlations in 2D human poses, and a mesh regression module that combines the extracted pose features with a mesh template to produce the final human mesh parameters.

Abstract (translated)

在本研究中,我们探讨了现有基于深度学习的人脸网格重构方法在平衡精度和计算效率方面所面临的挑战。这些方法通常会注重精度,导致网络规模过大和过度计算复杂度,这可能会影响它们在现实世界场景中的实际应用,如虚拟现实系统。为了解决这个问题,我们引入了一种模块化的多级 lightweight Graph-based Transformer 网络,用于从2D人类姿态中提取姿势和形状估计,提出了一种基于姿势的人脸网格重构方法,该方法注重计算效率而不会牺牲重构精度。我们的方法和一个2D到3D提升模块和一个网格回归模块组成,该模块利用GraphTransformers分析2D人类姿态中的结构化和隐含的关节 correlation,并将它们与网格模板相结合,以产生最终的人脸网格参数。

URL

https://arxiv.org/abs/2301.13403

PDF

https://arxiv.org/pdf/2301.13403.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot