Abstract
We propose a data-driven approach to online multi-object tracking (MOT) that uses a convolutional neural network (CNN) for data association in a tracking-by-detection framework. The problem of multi-target tracking aims to assign noisy detections to a-priori unknown and time-varying number of tracked objects across a sequence of frames. A majority of the existing solutions focus on either tediously designing cost functions or formulating the task of data association as a complex optimization problem that can be solved effectively. Instead, we exploit the power of deep learning to formulate the data association problem as inference in a CNN. To this end, we propose to learn a similarity function that combines cues from both image and spatial features of objects. Our solution learns to perform global assignments in 3D purely from data, handles noisy detections and a varying number of targets, and is easy to train. We evaluate our approach on the challenging KITTI dataset and show competitive results. Our code is available at https://git.uwaterloo.ca/wise-lab/fantrack.
Abstract (translated)
提出了一种基于数据驱动的在线多目标跟踪方法,该方法利用卷积神经网络(CNN)在检测跟踪框架中进行数据关联。多目标跟踪问题的目标是将噪声检测分配给一系列帧上的一个先验未知和时变跟踪对象。现有的解决方案大多集中在要么单调地设计成本函数,要么将数据关联的任务作为一个复杂的优化问题来制定,这是一个可以有效解决的问题。相反,我们利用深度学习的能力将数据关联问题表述为CNN中的推理。为此,我们建议学习一个相似函数,它结合了来自物体图像和空间特征的线索。我们的解决方案学习纯粹从数据执行三维全局任务,处理噪声检测和不同数量的目标,并且易于培训。我们评估了我们对具有挑战性的Kitti数据集的方法,并展示了竞争结果。我们的代码可以在https://git.uwaterloo.ca/wise-lab/fantrack上找到。
URL
https://arxiv.org/abs/1905.02843