Paper Reading AI Learner

Spatiotemporal Filtering for Event-Based Action Recognition

2019-03-17 12:13:14
Rohan Ghosh, Anupam Gupta, Andrei Nakagawa, Alcimar Soares, Nitish Thakor

Abstract

In this paper, we address the challenging problem of action recognition, using event-based cameras. To recognise most gestural actions, often higher temporal precision is required for sampling visual information. Actions are defined by motion, and therefore, when using event-based cameras it is often unnecessary to re-sample the entire scene. Neuromorphic, event-based cameras have presented an alternative to visual information acquisition by asynchronously time-encoding pixel intensity changes, through temporally precise spikes (10 micro-second resolution), making them well equipped for action recognition. However, other challenges exist, which are intrinsic to event-based imagers, such as higher signal-to-noise ratio, and a spatiotemporally sparse information. One option is to convert event-data into frames, but this could result in significant temporal precision loss. In this work we introduce spatiotemporal filtering in the spike-event domain, as an alternative way of channeling spatiotemporal information through to a convolutional neural network. The filters are local spatiotemporal weight matrices, learned from the spike-event data, in an unsupervised manner. We find that appropriate spatiotemporal filtering significantly improves CNN performance beyond state-of-the-art on the event-based DVS Gesture dataset. On our newly recorded action recognition dataset, our method shows significant improvement when compared with other, standard ways of generating the spatiotemporal filters.

Abstract (translated)

在本文中,我们使用基于事件的摄像机来解决动作识别的挑战性问题。为了识别大多数手势动作,通常需要更高的时间精度来采集视觉信息。动作由运动定义,因此,当使用基于事件的摄影机时,通常不需要重新采样整个场景。神经形态、基于事件的摄像头已经提供了一种替代视觉信息采集的方法,通过时间精确的峰值(10微秒分辨率)异步编码像素强度变化,使它们能够很好地进行动作识别。然而,也存在着其他的挑战,这些挑战是基于事件的成像仪固有的,例如高信噪比和时空稀疏信息。一种选择是将事件数据转换为帧,但这可能导致显著的时间精度损失。在这项工作中,我们在尖峰事件域中引入时空过滤,作为一种将时空信息通过卷积神经网络传输的替代方法。这些过滤器是以无监督方式从峰值事件数据中学习的局部时空权重矩阵。我们发现,在基于事件的DVS手势数据集上,适当的时空过滤显著提高了CNN的性能。在我们新记录的动作识别数据集上,我们的方法与其他生成时空滤波器的标准方法相比,有了显著的改进。

URL

https://arxiv.org/abs/1903.07067

PDF

https://arxiv.org/pdf/1903.07067.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot