Abstract
The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determine if the Segment Anything model (SAM) can contribute to this task. We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects. In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt. These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks. We also extend these frame-level segmentations to sequence-level segmentations that maintain object identity. Again, this simple model outperforms previous methods on multiple video object segmentation benchmarks.
Abstract (translated)
本论文的目标是运动分割,即在视频中发现和分割运动物体。这是一个已经研究广泛的领域,包括许多仔细研究过的方法,有时很复杂,包括自监督学习、从合成数据中学习、以物体为中心表示、以模式为基础表示等等。本文的兴趣在于确定Segment Anything模型(SAM)是否能为这项任务做出贡献。我们研究了两个将SAM与光学流结合的模型,利用SAM的分割能力与流发现和分组移动物体的能力。在第一个模型中,我们将SAM适应为以光学流为输入。在第二个模型中,SAM以RGB为输入,并使用流作为分割提示。这些简单的方法,没有任何进一步的修改,在单物体和多物体基准测试中显著优于所有先前的方法。我们还将这些帧级分割扩展到序列级分割,保持物体身份。再次,这个简单模型在多个视频物体分割基准测试中优于先前的方法。
URL
https://arxiv.org/abs/2404.12389