Paper Reading AI Learner

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

2024-04-30 13:14:11
Rayan Armani, Changlin Qian, Jiaxi Jiang, Christian Holz

Abstract

While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).

Abstract (translated)

虽然基于相机的捕捉系统仍然是记录人类运动的黄金标准,但基于稀疏可穿戴传感器的学习跟踪系统正在逐渐受到欢迎。最常见的使用惯性传感器,其漂移和抖动使得跟踪准确性受到限制。在本文中,我们提出了Ultra Inertial Poser,一种新颖的3D全身姿态估计方法,通过跨传感器距离约束漂移和抖动。我们使用轻量化的嵌入跟踪器估计这些距离,该跟踪器通过超宽带无线电基于动态的无需要静止参考锚点来增强6D惯性测量单位。然后将这些跨传感器距离与来自每个传感器的3D状态估计相结合。我们的基于图的机器学习模型处理3D状态和距离以估计一个人的3D全身姿态和 translation。为了训练我们的模型,我们使用运动捕捉数据库AMASS合成运动捕捉数据中的惯性测量和距离估计。为了评估,我们贡献了一个新的动作数据集,由25种不同的动作组成,由6个可穿戴式IMU+UWB跟踪器和光学运动捕捉系统捕获,总共有200分钟的同步传感器数据(UIP-DB)。我们广泛的实验结果表明,我们的方法在PIP和TIP上具有最先进的性能,将位置误差从$13.62$减少到$10.65$厘米($22\%$的降幅$)$,并将抖动从$1.56$减少到$0.055$千米/秒$^3$($97\%$的降幅)。

URL

https://arxiv.org/abs/2404.19541

PDF

https://arxiv.org/pdf/2404.19541.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot