Abstract
Optical flow is a classical task that is important to the vision community. Classical optical flow estimation uses two frames as input, whilst some recent methods consider multiple frames to explicitly model long-range information. The former ones limit their ability to fully leverage temporal coherence along the video sequence; and the latter ones incur heavy computational overhead, typically not possible for real-time flow estimation. Some multi-frame-based approaches even necessitate unseen future frames for current estimation, compromising real-time applicability in safety-critical scenarios. To this end, we present MemFlow, a real-time method for optical flow estimation and prediction with memory. Our method enables memory read-out and update modules for aggregating historical motion information in real-time. Furthermore, we integrate resolution-adaptive re-scaling to accommodate diverse video resolutions. Besides, our approach seamlessly extends to the future prediction of optical flow based on past observations. Leveraging effective historical motion aggregation, our method outperforms VideoFlow with fewer parameters and faster inference speed on Sintel and KITTI-15 datasets in terms of generalization performance. At the time of submission, MemFlow also leads in performance on the 1080p Spring dataset. Codes and models will be available at: this https URL.
Abstract (translated)
光学流是一种经典任务,对视觉社区非常重要。经典的Optical flow估计使用两个帧作为输入,而一些最近的方法考虑多个帧以明确建模长距离信息。前者限制了其在视频序列中充分利用时间一致性的能力;而后者则导致计算开销巨大,通常不适用于实时流估计。一些基于多帧的方法甚至需要观察到的未来帧来进行当前估计,从而在安全关键场景中降低了实时应用的可行性。为此,我们提出了MemFlow,一种在内存中进行光学流估计和预测的实时方法。我们的方法允许在实时过程中聚合历史运动信息。此外,我们还采用分辨率自适应缩放,以适应不同的视频分辨率。此外,我们的方法还扩展到基于过去观察进行光学流未来预测。通过有效的历史运动聚合,我们的方法在Sintel和KITTI-15数据集上的性能优于VideoFlow,具有更少的参数和更快的推理速度。到提交时,MemFlow还在1080p Spring数据集上领先。代码和模型将在此处提供:https://这个链接。
URL
https://arxiv.org/abs/2404.04808