Abstract
We consider the problem of segmenting objects in videos based on their motion and no other forms of supervision. Prior work has often approached this problem by using the principle of common fate, namely the fact that the motion of points that belong to the same object is strongly correlated. However, most authors have only considered instantaneous motion from optical flow. In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. The key difficulty is that long-term motion, unlike instantaneous motion, is difficult to model -- any parametric approximation is unlikely to capture complex motion patterns over long periods of time. We instead draw inspiration from subspace clustering approaches, proposing a loss function that seeks to group the trajectories into low-rank matrices where the motion of object points can be approximately explained as a linear combination of other point tracks. Our method outperforms the prior art on motion-based segmentation, which shows the utility of long-term motion and the effectiveness of our formulation.
Abstract (translated)
我们考虑根据视频中物体的运动来对其进行分割,而不依赖任何形式的其他监督信息。以往的研究通常通过利用“共命运原则”来解决这个问题,即属于同一对象的点的运动是高度相关的。然而,大多数研究仅限于使用即时光学流(optical flow)来进行这种关联分析。在本工作中,我们提出了一种训练分割网络的方法,该方法采用长期点轨迹作为监督信号以补充光学流信息。关键挑战在于,与瞬时运动相比,长时间的运动模型化更加困难——任何参数化近似都不大可能捕捉到长时段内复杂的运动模式。 为了解决这个问题,我们的研究从子空间聚类(subspace clustering)方法中汲取灵感,并提出了一种损失函数,该函数试图将轨迹分组成低秩矩阵,在这种矩阵中,物体点的运动可以通过其他点迹线的线性组合近似解释。实验表明,我们所提出的方法在基于运动进行分割的任务上超越了先前的研究成果,这不仅证明了长期运动信息的价值,还验证了我们的方法的有效性。
URL
https://arxiv.org/abs/2501.12392