GestARLite: An On-Device Pointing Finger Based Gestural Interface for Smartphones and Video See-Through Head-Mounts

Abstract
Abstract (translated)
URL
PDF

Abstract

Hand gestures form an intuitive means of interaction in Mixed Reality (MR) applications. However, accurate gesture recognition can be achieved only through state-of-the-art deep learning models or with the use of expensive sensors. Despite the robustness of these deep learning models, they are generally computationally expensive and obtaining real-time performance on-device is still a challenge. To this end, we propose a novel lightweight hand gesture recognition framework that works in First Person View for wearable devices. The models are trained on a GPU machine and ported on an Android smartphone for its use with frugal wearable devices such as the Google Cardboard and VR Box. The proposed hand gesture recognition framework is driven by a cascade of state-of-the-art deep learning models: MobileNetV2 for hand localisation, our custom fingertip regression architecture followed by a Bi-LSTM model for gesture classification. We extensively evaluate the framework on our EgoGestAR dataset. The overall framework works in real-time on mobile devices and achieves a classification accuracy of 80% on EgoGestAR video dataset with an average latency of only 0.12 s.

Abstract (translated)

手势在混合现实（MR）应用程序中形成了一种直观的交互方式。然而，只有通过最先进的深度学习模型或使用昂贵的传感器才能实现准确的手势识别。尽管这些深度学习模型具有很强的鲁棒性，但它们通常计算成本很高，在设备上获得实时性能仍然是一个挑战。为此，我们提出了一种新型的轻量级手势识别框架，该框架可用于可穿戴设备的第一人称视角。这些模型是在GPU机器上训练的，并移植到安卓智能手机上，以便与诸如谷歌硬纸板和虚拟现实盒等节俭的可穿戴设备一起使用。提议的手势识别框架由一系列最先进的深度学习模型驱动：用于手势定位的mobilenetv2，我们的自定义指尖回归结构，以及用于手势分类的bi-lstm模型。我们广泛地评估了egogestar数据集上的框架。整个框架在移动设备上实时工作，在egogestar视频数据集上实现了80%的分类精度，平均延迟仅为0.12秒。

URL

https://arxiv.org/abs/1904.09843

PDF

https://arxiv.org/pdf/1904.09843.pdf