Spatial-Temporal Relation Networks for Multi-Object Tracking

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent progress in multiple object tracking (MOT) has shown that a robust similarity score is key to the success of trackers. A good similarity score is expected to reflect multiple cues, e.g. appearance, location, and topology, over a long period of time. However, these cues are heterogeneous, making them hard to be combined in a unified network. As a result, existing methods usually encode them in separate networks or require a complex training approach. In this paper, we present a unified framework for similarity measurement which could simultaneously encode various cues and perform reasoning across both spatial and temporal domains. We also study the feature representation of a tracklet-object pair in depth, showing a proper design of the pair features can well empower the trackers. The resulting approach is named spatial-temporal relation networks (STRN). It runs in a feed-forward way and can be trained in an end-to-end manner. The state-of-the-art accuracy was achieved on all of the MOT15-17 benchmarks using public detection and online settings.

Abstract (translated)

多目标跟踪（MOT）的最新进展表明，鲁棒相似性评分是追踪器成功的关键。在很长一段时间内，很好的相似性评分可以反映多个线索，如外观、位置和拓扑结构。然而，这些提示是异构的，使得它们很难在一个统一的网络中组合起来。因此，现有的方法通常将它们编码在单独的网络中，或者需要一种复杂的训练方法。本文提出了一个统一的相似性度量框架，可以同时对各种线索进行编码，并在时空域进行推理。我们还深入研究了tracklet对象对的特征表示，说明了对特征的合理设计可以很好地增强跟踪者的能力。这种方法被称为时空关系网络（strn）。它以一种前馈的方式运行，并且可以以端到端的方式进行培训。使用公共检测和在线设置，在所有MOT15-17基准上实现了最先进的精度。

URL

https://arxiv.org/abs/1904.11489

PDF

https://arxiv.org/pdf/1904.11489.pdf