Abstract
In this paper, a novel signature of human action recognition, namely the curvature of a video sequence, is introduced. In this way, the distribution of sequential data is modeled, which enables few-shot learning. Instead of depending on recognizing features within images, our algorithm views actions as sequences on the universal time scale across a whole sequence of images. The video sequence, viewed as a curve in pixel space, is aligned by reparameterization using the arclength of the curve in pixel space. Once such curvatures are obtained, statistical indexes are extracted and fed into a learning-based classifier. Overall, our method is simple but powerful. Preliminary experimental results show that our method is effective and achieves state-of-the-art performance in video-based human action recognition. Moreover, we see latent capacity in transferring this idea into other sequence-based recognition applications such as speech recognition, machine translation, and text generation.
Abstract (translated)
本文介绍了一种新的人的动作识别特征,即视频序列的曲率。通过这种方法,对序列数据的分布进行了建模,从而实现了少量的镜头学习。我们的算法不依赖于识别图像中的特征,而是在整个图像序列的通用时间尺度上将动作视为序列。视频序列在像素空间中被视为曲线,通过使用像素空间中曲线的弧长重新参数化来对齐。一旦获得了这样的曲率,就可以提取统计指标并将其输入到基于学习的分类器中。总的来说,我们的方法很简单但很强大。初步实验结果表明,该方法是有效的,在基于视频的人的动作识别中达到了最先进的性能。此外,我们还看到了将这一概念转换为其他基于序列的识别应用程序(如语音识别、机器翻译和文本生成)的潜在能力。
URL
https://arxiv.org/abs/1904.13003