Abstract
Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main ad-vantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.
Abstract (translated)
事件摄像头是一种视觉传感器,它记录每像素亮度变化的异步流,称为“事件”。与基于帧的摄像机相比,它们在计算机视觉方面具有吸引人的优势,包括高时间分辨率、高动态范围和无运动模糊。由于事件信号的稀疏、非均匀时空布局,模式识别算法通常将事件聚合成基于网格的表示,然后通过标准视觉管道(例如卷积神经网络(CNN))对其进行处理。在这项工作中,我们引入了一个通用框架,通过一系列可微操作将事件流转换为基于网格的表示。我们的框架有两个主要的广告优势:(i)允许以端到端的方式学习输入事件表示和任务专用网络,以及(ii)制定一个分类法,统一文献中现有的大多数事件表示并识别新的事件表示。从经验上看,我们学习端到端事件表示的方法在光流量估计和对象识别方面比最先进的方法提高了大约12%。
URL
https://arxiv.org/abs/1904.08245