Paper Reading AI Learner

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

2018-07-07 11:18:56
Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang

Abstract

Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information, especially the temporal information between frames simultaneously. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51 respectively. The code for this project is available at https://github.com/kevin-ssy/Optical-Flow-Guided-Feature.

Abstract (translated)

运动表示在视频中的人类动作识别中起着至关重要的作用。在本研究中,我们介绍了一种用于视频动作识别的新颖紧凑运动表示,称为光流引导特征(OFF),它使网络能够通过快速而稳健的方法提取时间信息。 OFF来自光流的定义并且与光流正交。推导还为使用两帧之间的差异提供了理论支持。通过直接计算深度特征图的逐像素时空梯度,OFF可以嵌入任何现有的基于CNN的视频动作识别框架中,仅需要少量额外成本。它使CNN能够同时提取时空信息,尤其是帧之间的时间信息。这个简单但强大的想法通过实验结果得到验证。仅通过RGB输入馈电的OFF网络在UCF-101上实现了93.3%的竞争准确度,这与两个流(RGB和光流)获得的结果相当,但速度快15倍。实验结果还表明,OFF与其他运动方式(如光流)互补。当所提出的方法插入到最先进的视频动作识别框架中时,它在UCF-101和HMDB-51上的准确度分别为96:0%和74:2%。该项目的代码可在https://github.com/kevin-ssy/Optical-Flow-Guided-Feature上找到。

URL

https://arxiv.org/abs/1711.11152

PDF

https://arxiv.org/pdf/1711.11152.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot