Paper Reading AI Learner

End-to-End Learning of Representations for Asynchronous Event-Based Data

2019-04-17 12:54:37
Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza

Abstract

Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main ad-vantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.

Abstract (translated)

事件摄像头是一种视觉传感器,它记录每像素亮度变化的异步流,称为“事件”。与基于帧的摄像机相比,它们在计算机视觉方面具有吸引人的优势,包括高时间分辨率、高动态范围和无运动模糊。由于事件信号的稀疏、非均匀时空布局,模式识别算法通常将事件聚合成基于网格的表示,然后通过标准视觉管道(例如卷积神经网络(CNN))对其进行处理。在这项工作中,我们引入了一个通用框架,通过一系列可微操作将事件流转换为基于网格的表示。我们的框架有两个主要的广告优势:(i)允许以端到端的方式学习输入事件表示和任务专用网络,以及(ii)制定一个分类法,统一文献中现有的大多数事件表示并识别新的事件表示。从经验上看,我们学习端到端事件表示的方法在光流量估计和对象识别方面比最先进的方法提高了大约12%。

URL

https://arxiv.org/abs/1904.08245

PDF

https://arxiv.org/pdf/1904.08245.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot