Abstract
There has been remarkable progress on object detection and re-identification in recent years which are the core components for multi-object tracking. However, little attention has been focused on accomplishing the two tasks in a single network to improve the inference speed. The initial attempts along this path ended up with degraded results mainly because the re-identification branch is not appropriately learned. In this work, we study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problems. It remarkably outperforms the state-of-the-arts on the public datasets at $30$ fps. We hope this baseline could inspire and help evaluate new ideas in this field. The code and the pre-trained models will be released. Code is available at \url{this https URL}.
Abstract (translated)
URL
https://arxiv.org/abs/2004.01888