Paper Reading AI Learner

Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking

2018-08-05 05:41:45
Han Shen, Lichao Huang, Chang Huang, Wei Xu

Abstract

Traditional multiple object tracking methods divide the task into two parts: affinity learning and data association. The separation of the task requires to define a hand-crafted training goal in affinity learning stage and a hand-crafted cost function of data association stage, which prevents the tracking goals from learning directly from the feature. In this paper, we present a new multiple object tracking (MOT) framework with data-driven association method, named as Tracklet Association Tracker (TAT). The framework aims at gluing feature learning and data association into a unity by a bi-level optimization formulation so that the association results can be directly learned from features. To boost the performance, we also adopt the popular hierarchical association and perform the necessary alignment and selection of raw detection responses. Our model trains over 20X faster than a similar approach, and achieves the state-of-the-art performance on both MOT2016 and MOT2017 benchmarks.

Abstract (translated)

传统的多目标跟踪方法将任务分为两部分:亲和力学习和数据关联。任务的分离需要在亲和力学习阶段定义手工制作的培训目标,并在数据关联阶段手工制作成本函数,这样可以防止跟踪目标直接从特征中学习。在本文中,我们提出了一个新的多目标跟踪(MOT)框架与数据驱动的关联方法,命名为Tracklet Association Tracker(TAT)。该框架旨在通过双层优化公式将特征学习和数据关联粘合成一个整体,以便可以直接从特征中学习关联结果。为了提高性能,我们还采用了流行的分层关联,并执行原始检测响应的必要对齐和选择。我们的模型比同类方法的速度快20倍,并在MOT2016和MOT2017基准测试中实现了最先进的性能。

URL

https://arxiv.org/abs/1808.01562

PDF

https://arxiv.org/pdf/1808.01562.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot