Abstract
Accurate online multiple-camera vehicle tracking is essential for intelligent transportation systems, autonomous driving, and smart city applications. Like single-camera multiple-object tracking, it is commonly formulated as a graph problem of tracking-by-detection. Within this framework, existing online methods usually consist of two-stage procedures that cluster temporally first, then spatially, or vice versa. This is computationally expensive and prone to error accumulation. We introduce a graph representation that allows spatial-temporal clustering in a single, combined step: New detections are spatially and temporally connected with existing clusters. By keeping sparse appearance and positional cues of all detections in a cluster, our method can compare clusters based on the strongest available evidence. The final tracks are obtained online using a simple multicut assignment procedure. Our method does not require any training on the target scene, pre-extraction of single-camera tracks, or additional annotations. Notably, we outperform the online state-of-the-art on the CityFlow dataset in terms of IDF1 by more than 14%, and on the Synthehicle dataset by more than 25%, respectively. The code is publicly available.
Abstract (translated)
准确的在线多摄像头车辆跟踪对于智能交通系统、自动驾驶和智能城市应用至关重要。与单摄像头多对象跟踪一样,通常用跟踪检测问题来表示它。在这个框架内,现有的在线方法通常包括两个步骤:首先进行时序聚类,然后进行空间聚类;或者反过来。这是计算密集型且容易累积错误的。我们引入了一个图表示,允许在单个、联合步骤中进行空间-时间聚类:新检测到的样本在空间和时间上与现有的聚类相互连接。通过保留所有检测到的样本的稀疏表示和位置线索,我们的方法可以基于最强的可用证据比较聚类。通过简单的多路复用分配方案,我们可以在在线过程中获得最终轨迹。我们的方法不需要在目标场景上进行训练,也不需要预先提取单摄像头的轨迹或附加注释。值得注意的是,我们在CityFlow数据集上比在线最先进的方法提高了约14%,而在Synthehicle数据集上提高了约25%。代码是公开可用的。
URL
https://arxiv.org/abs/2410.02638