Abstract
Hand pose estimation from egocentric video has broad implications across various domains, including human-computer interaction, assistive technologies, activity recognition, and robotics, making it a topic of significant research interest. The efficacy of modern machine learning models depends on the quality of data used for their training. Thus, this work is devoted to the analysis of state-of-the-art egocentric datasets suitable for 2D hand pose estimation. We propose a novel protocol for dataset evaluation, which encompasses not only the analysis of stated dataset characteristics and assessment of data quality, but also the identification of dataset shortcomings through the evaluation of state-of-the-art hand pose estimation models. Our study reveals that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases. There is no ideal benchmark dataset yet; however, H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.
Abstract (translated)
从以自指视频进行手部姿势估计对许多领域都产生了广泛的影响,包括人机交互、辅助技术、活动识别和机器人技术,使其成为一个重要的研究兴趣话题。现代机器学习模型的有效性取决于用于其训练的数据质量。因此,本文致力于分析适用于2D手部姿势估计的最新自指视频数据集。我们提出了一个新颖的数据集评估协议,不仅包括对数据集特征的分析和对数据质量的评估,还包括通过评估最先进的2D手部姿势估计模型来识别数据集缺陷。我们的研究显示,尽管有许多旨在用于2D手部姿势估计的自指视频数据库可用,但大多数都是为特定应用场景而设计的。目前还没有理想的基准数据集;然而,H2O和GAN生成的手部数据集脱颖而出,分别成为最具有前景的实时和合成数据集。
URL
https://arxiv.org/abs/2409.07337