Abstract
While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).
Abstract (translated)
虽然基于相机的捕捉系统仍然是记录人类运动的黄金标准,但基于稀疏可穿戴传感器的学习跟踪系统正在逐渐受到欢迎。最常见的使用惯性传感器,其漂移和抖动使得跟踪准确性受到限制。在本文中,我们提出了Ultra Inertial Poser,一种新颖的3D全身姿态估计方法,通过跨传感器距离约束漂移和抖动。我们使用轻量化的嵌入跟踪器估计这些距离,该跟踪器通过超宽带无线电基于动态的无需要静止参考锚点来增强6D惯性测量单位。然后将这些跨传感器距离与来自每个传感器的3D状态估计相结合。我们的基于图的机器学习模型处理3D状态和距离以估计一个人的3D全身姿态和 translation。为了训练我们的模型,我们使用运动捕捉数据库AMASS合成运动捕捉数据中的惯性测量和距离估计。为了评估,我们贡献了一个新的动作数据集,由25种不同的动作组成,由6个可穿戴式IMU+UWB跟踪器和光学运动捕捉系统捕获,总共有200分钟的同步传感器数据(UIP-DB)。我们广泛的实验结果表明,我们的方法在PIP和TIP上具有最先进的性能,将位置误差从$13.62$减少到$10.65$厘米($22\%$的降幅$)$,并将抖动从$1.56$减少到$0.055$千米/秒$^3$($97\%$的降幅)。
URL
https://arxiv.org/abs/2404.19541