Online Gesture Recognition using Transformer and Natural Language Processing

Abstract
Abstract (translated)
URL
PDF

Abstract

The Transformer architecture is shown to provide a powerful machine transduction framework for online handwritten gestures corresponding to glyph strokes of natural language sentences. The attention mechanism is successfully used to create latent representations of an end-to-end encoder-decoder model, solving multi-level segmentation while also learning some language features and syntax rules. The additional use of a large decoding space with some learned Byte-Pair-Encoding (BPE) is shown to provide robustness to ablated inputs and syntax rules. The encoder stack was directly fed with spatio-temporal data tokens potentially forming an infinitely large input vocabulary, an approach that finds applications beyond that of this work. Encoder transfer learning capabilities is also demonstrated on several languages resulting in faster optimisation and shared parameters. A new supervised dataset of online handwriting gestures suitable for generic handwriting recognition tasks was used to successfully train a small transformer model to an average normalised Levenshtein accuracy of 96% on English or German sentences and 94% in French.

Abstract (translated)

Transformer架构提供了一种强大的机器翻译框架，以对应自然语言句子glyph strokes的在线手写手势。注意力机制成功地被用于创建端到端编码解码模型的隐态表示，同时解决多层次分割，同时也学习了一些语言特征和语法规则。此外，使用一些已学习的字节对编码(BPE)的大解码空间提供了对 ablated 输入和语法规则的鲁棒性。编码器栈直接接收时空数据 token，可能形成无限大的输入词汇，这种方法超越了本工作的应用。Encoder 迁移学习能力也被在多个语言中演示，导致更快的优化和共享参数。了一个新的适合通用手写识别任务的在线手写手势监督数据集被使用，成功地训练了一个小型Transformer模型，使其在英语或德语句子上的平均 normalised 拼写错误率为96%，而在法语中的为94%。

URL

https://arxiv.org/abs/2305.03407

PDF

https://arxiv.org/pdf/2305.03407.pdf