Paper Reading AI Learner

EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera

2025-02-11 18:57:05
Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Alain Pagani, Didier Stricker, Christian Theobalt, Vladislav Golyanik

Abstract

Monocular egocentric 3D human motion capture remains a significant challenge, particularly under conditions of low lighting and fast movements, which are common in head-mounted device applications. Existing methods that rely on RGB cameras often fail under these conditions. To address these limitations, we introduce EventEgo3D++, the first approach that leverages a monocular event camera with a fisheye lens for 3D human motion capture. Event cameras excel in high-speed scenarios and varying illumination due to their high temporal resolution, providing reliable cues for accurate 3D human motion capture. EventEgo3D++ leverages the LNES representation of event streams to enable precise 3D reconstructions. We have also developed a mobile head-mounted device (HMD) prototype equipped with an event camera, capturing a comprehensive dataset that includes real event observations from both controlled studio environments and in-the-wild settings, in addition to a synthetic dataset. Additionally, to provide a more holistic dataset, we include allocentric RGB streams that offer different perspectives of the HMD wearer, along with their corresponding SMPL body model. Our experiments demonstrate that EventEgo3D++ achieves superior 3D accuracy and robustness compared to existing solutions, even in challenging conditions. Moreover, our method supports real-time 3D pose updates at a rate of 140Hz. This work is an extension of the EventEgo3D approach (CVPR 2024) and further advances the state of the art in egocentric 3D human motion capture. For more details, visit the project page at this https URL.

Abstract (translated)

单目第一人称视角的3D人体动作捕捉仍然是一个重大挑战,特别是在低光照和快速运动条件下,这些条件在头戴式设备应用中非常常见。现有的依赖RGB摄像头的方法在这种情况下往往效果不佳。为了克服这些限制,我们引入了EventEgo3D++,这是首个利用单目事件相机(配备鱼眼镜头)进行3D人体动作捕捉的技术方法。由于其高时间分辨率,事件相机在高速场景和变化光照条件下表现出色,能够提供准确的3D人体运动捕捉所需的可靠线索。EventEgo3D++通过利用事件流的LNES表示法来实现精确的三维重建。我们还开发了一款配备事件摄像头的移动头戴式设备(HMD)原型机,并采集了一个全面的数据集,其中包括从受控工作室环境和野外设置中收集的真实事件观察数据以及合成数据集。为了提供一个更为综合的数据集,我们也加入了以不同视角捕捉HMD佩戴者的第一人称RGB视频流,同时包含与其对应的SMPL人体模型。 我们的实验表明,EventEgo3D++在各种挑战条件下实现了比现有解决方案更优的三维精度和鲁棒性,并且能够支持每秒140帧的速度实时更新三维姿态。这项工作是针对CVPR 2024年提出的方法——EventEgo3D的进一步发展,在第一人称视角下的人体运动捕捉领域推进了技术前沿。 欲了解更多信息,请访问项目主页:[此链接](https://example.com/project-page)(请将"this https URL"替换为实际链接)。

URL

https://arxiv.org/abs/2502.07869

PDF

https://arxiv.org/pdf/2502.07869.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot