Abstract
Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a tracking data engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations while requiring only 3-20% of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. Our code and models will be available upon acceptance.
Abstract (translated)
提高视频轨迹注释的效率具有潜力,使下一代渴望大规模数据的数据跟踪算法在大型数据集上繁荣发展。尽管这项任务非常重要,但目前很少有工作探讨如何全面有效地对跟踪数据集进行标注。在这项工作中,我们介绍了一个名为SPAM的跟踪数据引擎,它提供高质量的数据注释,同时最小化人工干预。SPAM基于两个关键见解:i)大多数跟踪场景都可以轻松解决。为了利用这一点,我们利用预训练模型生成高质量伪标签,将人类参与度限制在较小的部分更难的实例上; ii)通过图的形式处理轨迹注释的时间维度上的空间依赖关系可以优雅而有效地进行表示。因此,我们使用统一图表示来解决跨时间对轨迹的注释。基于这些见解,SPAM在给定的地面真实标注成本下产生高质量注释。我们证明了,使用SPAM标签训练的跟踪器在性能上与使用人类标注标记的跟踪器相当,而只需要人类标注工作的3-20%。因此,SPAM为大规模轨迹数据集的高效标注铺平了道路。我们的代码和模型将在接受审查时公开可用。
URL
https://arxiv.org/abs/2404.11426