E$^3$Pose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation

Abstract
Abstract (translated)
URL
PDF

Abstract

Multi-human 3D pose estimation plays a key role in establishing a seamless connection between the real world and the virtual world. Recent efforts adopted a two-stage framework that first builds 2D pose estimations in multiple camera views from different perspectives and then synthesizes them into 3D poses. However, the focus has largely been on developing new computer vision algorithms on the offline video datasets without much consideration on the energy constraints in real-world systems with flexibly-deployed and battery-powered cameras. In this paper, we propose an energy-efficient edge-assisted multiple-camera system, dubbed E$^3$Pose, for real-time multi-human 3D pose estimation, based on the key idea of adaptive camera selection. Instead of always employing all available cameras to perform 2D pose estimations as in the existing works, E$^3$Pose selects only a subset of cameras depending on their camera view qualities in terms of occlusion and energy states in an adaptive manner, thereby reducing the energy consumption (which translates to extended battery lifetime) and improving the estimation accuracy. To achieve this goal, E$^3$Pose incorporates an attention-based LSTM to predict the occlusion information of each camera view and guide camera selection before cameras are selected to process the images of a scene, and runs a camera selection algorithm based on the Lyapunov optimization framework to make long-term adaptive selection decisions. We build a prototype of E$^3$Pose on a 5-camera testbed, demonstrate its feasibility and evaluate its performance. Our results show that a significant energy saving (up to 31.21%) can be achieved while maintaining a high 3D pose estimation accuracy comparable to state-of-the-art methods.

Abstract (translated)

多人3D姿态估计在建立真实世界和虚拟世界之间的无缝连接起着关键作用。最近的努力采用了一个两阶段的框架,先从多个视角构建2D姿态估计,然后将它们合成为3D姿态。然而,主要关注点是开发用于 offline 视频数据集的新计算机视觉算法,而忽视了在灵活部署且电池供电的摄像头的实际系统中的能量限制。在本文中,我们提出了一种名为 E$^3$Pose 的高效边缘协助多摄像头系统,因其自适应相机选择的关键思想而被称为 E$^3$Pose。该系统旨在实时进行多人3D姿态估计,基于自适应相机选择的概念。 instead of always using all available cameras to perform 2D姿态估计, E$^3$Pose 选择根据它们的遮挡和能量状态以自适应方式选择的一小部分相机,从而减少了能量消耗(这相当于延长电池寿命),并提高了姿态估计精度。为了实现这一目标,E$^3$Pose 引入了基于注意力的LSTM,预测每个相机视图的遮挡信息,并在选择相机以处理场景图像之前指导相机选择,并基于 Lyapunov 优化框架运行一个相机选择算法,以做出长期自适应选择决策。我们在一个5相机测试平台上构建了一个原型 E$^3$Pose,并证明了其可行性并评估了其性能。我们的结果表明,可以在保持高3D姿态估计精度与当前先进技术相当的同时实现巨大的能源节约。

URL

https://arxiv.org/abs/2301.09015

PDF

https://arxiv.org/pdf/2301.09015.pdf