Pervasive Hand Gesture Recognition for Smartphones using Non-audible Sound and Deep Learning

2021-08-04 16:23:26

Ahmed Ibrahim, Ayman El-Refai, Sara Ahmed, Mariam Aboul-Ela, Hesham M. Eraqi, Mohamed Moustafa

arXiv_CV

arXiv_CV CNN Recognition Detection Deep_Learning Gesture Pose Action

Abstract
Abstract (translated)
URL
PDF

Abstract

Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone's built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudible sound) from the smartphone's stereo speakers, which is then received by the smartphone's microphone and processed via a Convolutional Neural Network (CNN) for Hand Gesture Recognition. Data augmentation techniques are proposed to improve the detection accuracy and three dual-channel input fusion methods are compared. The first method merges the dual-channel audio as a single input spectrogram image. The second method adopts early fusion by concatenating the dual-channel spectrograms. The third method adopts late fusion by having two convectional input branches processing each of the dual-channel spectrograms and then the outputs are merged by the last layers. Our experimental results demonstrate a promising detection accuracy for the six gestures presented in our publicly available dataset with an accuracy of 93.58\% as a baseline.

Abstract (translated)

URL

https://arxiv.org/abs/2108.02148

PDF

https://arxiv.org/pdf/2108.02148.pdf