Abstract
Event cameras provide high temporal precision, low data rates, and high dynamic range visual perception, which are well-suited for optical flow estimation. While data-driven optical flow estimation has obtained great success in RGB cameras, its generalization performance is seriously hindered in event cameras mainly due to the limited and biased training data. In this paper, we present a novel simulator, BlinkSim, for the fast generation of large-scale data for event-based optical flow. BlinkSim consists of a configurable rendering engine and a flexible engine for event data simulation. By leveraging the wealth of current 3D assets, the rendering engine enables us to automatically build up thousands of scenes with different objects, textures, and motion patterns and render very high-frequency images for realistic event data simulation. Based on BlinkSim, we construct a large training dataset and evaluation benchmark BlinkFlow that contains sufficient, diversiform, and challenging event data with optical flow ground truth. Experiments show that BlinkFlow improves the generalization performance of state-of-the-art methods by more than 40% on average and up to 90%. Moreover, we further propose an Event optical Flow transFormer (E-FlowFormer) architecture. Powered by our BlinkFlow, E-FlowFormer outperforms the SOTA methods by up to 91% on MVSEC dataset and 14% on DSEC dataset and presents the best generalization performance.
Abstract (translated)
事件相机提供高时间精度、低数据速率、高动态范围的视觉感知,非常适合用于光学流估计。虽然基于数据的光学流估计在RGB相机中取得了巨大的成功,但在事件相机中其泛化性能却受到了严重的阻碍,主要是因为训练数据的限制和偏差。在本文中,我们介绍了一种新模拟器BlinkSim,用于快速生成基于事件事件的光学流大规模数据。BlinkSim由一个可配置渲染引擎和一个灵活的引擎组成。通过利用当前3D资产的丰富资源,渲染引擎使我们能够自动构建数千个场景,包括不同的物体、纹理和运动模式,并渲染非常高频的图像,以进行真实的事件数据模拟。基于BlinkSim,我们构建了一个大规模的训练数据和评估基准BlinkFlow,其中包含了足够的、多样化和具有挑战性的事件数据,并使用光学流真实先验。实验表明,BlinkFlow平均提高了现有方法的泛化性能超过40%,高达90%。此外,我们还提出了一个事件光学流转换器(E-Flow former)架构。通过我们的BlinkFlow,E-Flowformer在MVSEC数据集上比SOTA方法高出91%,在DSEC数据集上高出14%,表现出最佳泛化性能。
URL
https://arxiv.org/abs/2303.07716