A Modular Multi-stage Lightweight Graph Transformer Network for Human Pose and Shape Estimation from 2D Human Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency. These methods typically prioritize accuracy, resulting in large network sizes and excessive computational complexity, which may hinder their practical application in real-world scenarios, such as virtual reality systems. To address this issue, we introduce a modular multi-stage lightweight graph-based transformer network for human pose and shape estimation from 2D human pose, a pose-based human mesh reconstruction approach that prioritizes computational efficiency without sacrificing reconstruction accuracy. Our method consists of a 2D-to-3D lifter module that utilizes graph transformers to analyze structured and implicit joint correlations in 2D human poses, and a mesh regression module that combines the extracted pose features with a mesh template to produce the final human mesh parameters.

Abstract (translated)

在本研究中,我们探讨了现有基于深度学习的人脸网格重构方法在平衡精度和计算效率方面所面临的挑战。这些方法通常会注重精度,导致网络规模过大和过度计算复杂度,这可能会影响它们在现实世界场景中的实际应用,如虚拟现实系统。为了解决这个问题,我们引入了一种模块化的多级 lightweight Graph-based Transformer 网络,用于从2D人类姿态中提取姿势和形状估计,提出了一种基于姿势的人脸网格重构方法,该方法注重计算效率而不会牺牲重构精度。我们的方法和一个2D到3D提升模块和一个网格回归模块组成,该模块利用GraphTransformers分析2D人类姿态中的结构化和隐含的关节 correlation,并将它们与网格模板相结合,以产生最终的人脸网格参数。

URL

https://arxiv.org/abs/2301.13403

PDF

https://arxiv.org/pdf/2301.13403.pdf