Abstract
Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets restricts further investigation of VOS in realistic scenarios. Thus, we propose a novel benchmark named LVOS, comprising 720 videos with 296,401 frames and 407,945 high-quality annotations. Videos in LVOS last 1.14 minutes on average, approximately 5 times longer than videos in existing datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objects. Compared to previous benchmarks, our LVOS better reflects VOS models' performance in real scenarios. Based on LVOS, we evaluate 20 existing VOS models under 4 different settings and conduct a comprehensive analysis. On LVOS, these models suffer a large performance drop, highlighting the challenge of achieving precise tracking and segmentation in real-world scenarios. Attribute-based analysis indicates that key factor to accuracy decline is the increased video length, emphasizing LVOS's crucial role. We hope our LVOS can advance development of VOS in real scenes. Data and code are available at this https URL.
Abstract (translated)
视频对象分割(VOS)旨在在视频中区分和跟踪目标对象。尽管通过离线VOS模型的优异性能,已经达到了很好的效果,但现有的VOS基准主要关注持续约5秒的短期视频,其中物体大部分时间都是可见的。然而,这些基准未能很好地代表实际应用场景,缺乏长期数据集也限制了VOS在现实场景中的进一步研究。因此,我们提出了一个名为LVOS的新基准,由720个视频组成,包含296,401帧和407,945个高质量注释。LVOS中的视频平均持续1.14分钟,比现有数据集中的视频长约5倍。每个视频具有各种属性,尤其是来自野生的具有挑战性的属性,例如长期重复和跨时间相关的类似物体。与以前的基准相比,我们的LVOS更能反映VOS模型在现实场景中的性能。基于LVOS,我们对4种不同设置下的20个现有VOS模型进行了评估,并进行了全面分析。在LVOS上,这些模型性能下降较大,突出了在现实场景中实现精确跟踪和分割的挑战。基于属性的分析表明,准确度下降的关键因素是视频长度,强调了LVOS在现实场景中具有关键作用。我们希望我们的LVOS能够促进VOS在现实场景的发展。数据和代码可在此链接处获取:https://www.example.com/
URL
https://arxiv.org/abs/2404.19326