Abstract
We present RopeTP, a novel framework that combines Robust pose estimation with a diffusion Trajectory Prior to reconstruct global human motion from videos. At the heart of RopeTP is a hierarchical attention mechanism that significantly improves context awareness, which is essential for accurately inferring the posture of occluded body parts. This is achieved by exploiting the relationships with visible anatomical structures, enhancing the accuracy of local pose estimations. The improved robustness of these local estimations allows for the reconstruction of precise and stable global trajectories. Additionally, RopeTP incorporates a diffusion trajectory model that predicts realistic human motion from local pose sequences. This model ensures that the generated trajectories are not only consistent with observed local actions but also unfold naturally over time, thereby improving the realism and stability of 3D human motion reconstruction. Extensive experimental validation shows that RopeTP surpasses current methods on two benchmark datasets, particularly excelling in scenarios with occlusions. It also outperforms methods that rely on SLAM for initial camera estimates and extensive optimization, delivering more accurate and realistic trajectories.
Abstract (translated)
我们提出了一种名为RopeTP的新框架,该框架结合了鲁棒的姿态估计与扩散轨迹先验,从视频中重建全局人体运动。RopeTP的核心是一种分层注意力机制,这种机制显著提高了对上下文的感知能力,这对于准确推断被遮挡身体部位的姿态至关重要。通过利用与可见解剖结构的关系,增强了局部姿态估计的准确性。这些局部估计改进后的鲁棒性使得能够重建精确且稳定的全局轨迹成为可能。此外,RopeTP还包含一个扩散轨迹模型,该模型可以从局部姿态序列预测出真实的人体运动。此模型确保生成的轨迹不仅与观察到的局部动作一致,而且随着时间自然展开,从而提高了三维人体运动重建的真实性和稳定性。广泛的实验验证表明,RopeTP在两个基准数据集上超越了当前的方法,在遮挡场景中表现尤为出色。它还优于依赖SLAM进行初始相机估计和广泛优化的方法,提供更准确且真实的轨迹。
URL
https://arxiv.org/abs/2410.20358