Abstract
The Human-Machine Interaction (HMI) researchfield is an important topic in machine learning that has beendeeply investigated thanks to the rise of computing power in thelast years. The first time, it is possible to use machine learningto classify images and/or videos instead of the traditionalcomputer vision algorithms. The aim of this project is to builda symbiosis between a convolutional neural network (CNN)[1] and a recurrent neural network (RNN) [2] to recognizecultural/anthropological Italian sign language gestures fromvideos. The CNN extracts important features that later areused by the RNN. With RNNs we are able to store temporalinformation inside the model to provide contextual informationfrom previous frames to enhance the prediction accuracy. Ournovel approach uses different data augmentation techniquesand regularization methods from only RGB frames to avoidoverfitting and provide a small generalization error.
Abstract (translated)
URL
https://arxiv.org/abs/2109.09396