Abstract
In this paper, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious inter-frame interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in unsupervised video object segmentation but also delivers competitive results in video salient object detection. These findings highlight the method's robust versatility and its adeptness in adapting to a range of segmentation tasks. Source code is available on this https URL.
Abstract (translated)
在这篇论文中,我们通过提出一种名为MTNet的高效算法来解决无监督视频对象分割(UVOS)中的挑战。该算法同时利用了运动和时间线索。与以往专注于将外观与运动相结合或建模时间关系的方法不同,我们的方法通过在一个统一框架内整合这两个方面,实现了它们的有效结合。MTNet的设计在于,在编码器的特征提取过程中有效地融合了外观和运动特征,从而促进更互补的表示形式。 为了捕捉视频中复杂的长距离上下文动态和信息,我们引入了一个时间变换模块,这有助于在整个视频片段中实现有效的帧间交互。此外,我们在所有特征级别上使用了一连串的解码器来充分利用提取到的特征,并致力于生成越来越精确的分割掩模。 因此,MTNet提供了一个强大而紧凑的框架,探索了时间和跨模式的知识,从而能够在各种复杂场景下高效地准确定位和跟踪主要对象。在多个基准测试中的广泛实验最终证明,我们的方法不仅在无监督视频对象分割方面达到了最先进的性能,在视频显著目标检测中也提供了具有竞争力的结果。 这些发现突显了该方法的稳健性和适应性,以及其对一系列分割任务的有效应对能力。源代码可在[这个链接](https://this_https_URL.com)获取。
URL
https://arxiv.org/abs/2501.07806