Abstract
In this paper, we propose ProTracker, a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. The key idea of our method is incorporating probabilistic integration to refine multiple predictions from both optical flow and semantic features for robust short-term and long-term tracking. Specifically, we integrate optical flow estimations in a probabilistic manner, producing smooth and accurate trajectories by maximizing the likelihood of each prediction. To effectively re-localize challenging points that disappear and reappear due to occlusion, we further incorporate long-term feature correspondence into our flow predictions for continuous trajectory generation. Extensive experiments show that ProTracker achieves the state-of-the-art performance among unsupervised and self-supervised approaches, and even outperforms supervised methods on several benchmarks. Our code and model will be publicly available upon publication.
Abstract (translated)
在这篇论文中,我们提出了一种名为ProTracker的新型框架,用于在视频中对任意点进行稳健且准确的长期密集跟踪。我们的方法的核心思想是通过结合概率集成来优化来自光流和语义特征的多个预测结果,从而实现短期和长期内的稳健跟踪。具体来说,我们将光流估计以概率的方式整合起来,在最大化每个预测可能性的同时生成平滑而精确的轨迹。为了有效地重新定位由于遮挡而消失又重新出现的具有挑战性的点,我们进一步在我们的光流预测中引入了长期特征对应关系,从而实现连续轨迹的生成。广泛的实验表明,ProTracker在无监督和自监督方法中的性能处于行业领先水平,并且甚至在多个基准测试上超越了有监督的方法。论文发布后,我们的代码和模型将公开提供。
URL
https://arxiv.org/abs/2501.03220