Abstract
Event cameras unlock new frontiers that were previously unthinkable with standard frame-based cameras. One notable example is low-latency motion estimation (optical flow), which is critical for many real-time applications. In such applications, the computational efficiency of algorithms is paramount. Although recent deep learning paradigms such as CNN, RNN, or ViT have shown remarkable performance, they often lack the desired computational efficiency. Conversely, asynchronous event-based methods including SNNs and GNNs are computationally efficient; however, these approaches fail to capture sufficient spatio-temporal information, a powerful feature required to achieve better performance for optical flow estimation. In this work, we introduce Spatio-Temporal State Space Model (STSSM) module along with a novel network architecture to develop an extremely efficient solution with competitive performance. Our STSSM module leverages state-space models to effectively capture spatio-temporal correlations in event data, offering higher performance with lower complexity compared to ViT, CNN-based architectures in similar settings. Our model achieves 4.5x faster inference and 8x lower computations compared to TMA and 2x lower computations compared to EV-FlowNet with competitive performance on the DSEC benchmark. Our code will be available at this https URL
Abstract (translated)
事件相机解锁了传统帧基摄像头无法实现的新领域。一个典型的例子是低延迟运动估计(光学流),这对许多实时应用来说至关重要。在这些应用场景中,算法的计算效率尤为重要。尽管最近的深度学习范式如CNN、RNN或ViT展现了卓越的表现力,但它们往往缺乏所需的计算效率。相反,异步事件基方法包括SNN和GNN虽然计算效率高,但却无法捕捉足够的时空信息,这是实现更好的光学流估计性能的关键特征之一。 在这项工作中,我们引入了时空状态空间模型(STSSM)模块,并开发了一种新的网络架构,以构建一个极其高效的解决方案并具有竞争性的性能。我们的STSSM模块利用状态空间模型有效地捕获事件数据中的时空相关性,在相似设置下与ViT和基于CNN的架构相比,实现了更高性能的同时降低了复杂度。我们模型在DSEC基准测试中达到比TMA快4.5倍的推理速度,并且计算量减少了8倍;相较于EV-FlowNet,计算量减少了一半,同时保持了竞争性的性能。 我们的代码将在以下网址提供:[此链接应由原作者补充]
URL
https://arxiv.org/abs/2506.07878