Paper Reading AI Learner

SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

2024-04-17 14:33:41
Orcun Cetintas, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé

Abstract

Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a tracking data engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations while requiring only 3-20% of the human labeling effort. Hence, SPAM paves the way towards highly efficient labeling of large-scale tracking datasets. Our code and models will be available upon acceptance.

Abstract (translated)

提高视频轨迹注释的效率具有潜力,使下一代渴望大规模数据的数据跟踪算法在大型数据集上繁荣发展。尽管这项任务非常重要,但目前很少有工作探讨如何全面有效地对跟踪数据集进行标注。在这项工作中,我们介绍了一个名为SPAM的跟踪数据引擎,它提供高质量的数据注释,同时最小化人工干预。SPAM基于两个关键见解:i)大多数跟踪场景都可以轻松解决。为了利用这一点,我们利用预训练模型生成高质量伪标签,将人类参与度限制在较小的部分更难的实例上; ii)通过图的形式处理轨迹注释的时间维度上的空间依赖关系可以优雅而有效地进行表示。因此,我们使用统一图表示来解决跨时间对轨迹的注释。基于这些见解,SPAM在给定的地面真实标注成本下产生高质量注释。我们证明了,使用SPAM标签训练的跟踪器在性能上与使用人类标注标记的跟踪器相当,而只需要人类标注工作的3-20%。因此,SPAM为大规模轨迹数据集的高效标注铺平了道路。我们的代码和模型将在接受审查时公开可用。

URL

https://arxiv.org/abs/2404.11426

PDF

https://arxiv.org/pdf/2404.11426.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot