Abstract
Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation imposes constraints that weaken the model's capacity to adequately handle ambiguities in flow estimation, thereby intensifying the difficulty of preserving flow accuracy. This paper introduces two synergistic techniques, Self-Cleaning Iteration (SCI) and Regression Focal Loss (RFL), designed to enhance the capabilities of optical flow models, with a focus on addressing optical flow regression ambiguities. These techniques prove particularly effective in mitigating error propagation, a prevalent issue in optical flow models that employ iterative refinement. Notably, these techniques add negligible to zero overhead in model parameters and inference latency, thereby preserving real-time on-device efficiency. The effectiveness of our proposed SCI and RFL techniques, collectively referred to as SciFlow for brevity, is demonstrated across two distinct lightweight optical flow model architectures in our experiments. Remarkably, SciFlow enables substantial reduction in error metrics (EPE and Fl-all) over the baseline models by up to 6.3% and 10.5% for in-domain scenarios and by up to 6.2% and 13.5% for cross-domain scenarios on the Sintel and KITTI 2015 datasets, respectively.
Abstract (translated)
光学流估计对于各种视觉任务至关重要。尽管在最近取得了重大进展,但实现实时在设备上进行光学流估计仍然是一个复杂挑战。首先,一个光学流模型必须足够轻量化,以满足计算和内存约束,以确保在设备上实现实时性能。其次,实时在设备上操作的必要性强化了模型在流估计中应对不确定性能力的限制,从而加大了保持流准确性的难度。本文介绍了两种协同技术,自清洁迭代(SCI)和回归焦点损失(RFL),旨在增强光学流模型的能力,特别关注解决光学流回归不确定性的问题。这些技术在减轻错误传播方面特别有效,这是在光学流模型中采用迭代精炼方法时普遍存在的问题。值得注意的是,这些技术在模型参数和推理延迟上增加的微不足道的开销可以保持实时设备效率。我们在实验中通过两种轻量化的光学流模型架构来评估我们提出的SCI和RFL技术的有效性。实验结果表明,SciFlow在两个不同的轻量级光学流模型架构上的效果非常显著。特别地,SciFlow在基线模型上将错误指标(EPE和Fl-all)的减少量分别达到6.3%和10.5%,在域场景和跨域场景上分别将误差减少6.2%和13.5%。
URL
https://arxiv.org/abs/2404.08135