Paper Reading AI Learner

FANTrack: 3D Multi-Object Tracking with Feature Association Network

2019-05-07 23:26:03
Erkan Baser, Venkateshwaran Balasubramanian, Prarthana Bhattacharyya, Krzysztof Czarnecki

Abstract

We propose a data-driven approach to online multi-object tracking (MOT) that uses a convolutional neural network (CNN) for data association in a tracking-by-detection framework. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects. Our solution learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train. We evaluate our approach on the challenging KITTI dataset and show competitive results. Our code is available at https://git.uwaterloo.ca/wise-lab/fantrack.

Abstract (translated)

提出了一种基于数据驱动的在线多目标跟踪方法,该方法利用卷积神经网络(CNN)在检测跟踪框架中进行数据关联。多目标跟踪问题的目标是将噪声检测分配给一系列帧上的一个先验未知和时变跟踪对象。现有的解决方案大多集中在要么单调地设计成本函数,要么将数据关联的任务作为一个复杂的优化问题来制定,这是一个可以有效解决的问题。相反,我们利用深度学习的能力将数据关联问题表述为CNN中的推理。为此,我们建议学习一个相似函数,它结合了来自物体图像和空间特征的线索。我们的解决方案学习纯粹从数据执行三维全局任务,处理噪声检测和不同数量的目标,并且易于培训。我们评估了我们对具有挑战性的Kitti数据集的方法,并展示了竞争结果。我们的代码可以在https://git.uwaterloo.ca/wise-lab/fantrack上找到。

URL

https://arxiv.org/abs/1905.02843

PDF

https://arxiv.org/pdf/1905.02843.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot