Abstract
Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics. While MOTRv2 partly addresses this problem, it demands an additional detection network for assistance. In this work, we serve as the first to reveal that this conflict arises from the unfair label assignment between detect queries and track queries during training, where these detect queries recognize targets and track queries associate them. Based on this observation, we propose MOTRv3, which balances the label assignment process using the developed release-fetch supervision strategy. In this strategy, labels are first released for detection and gradually fetched back for association. Besides, another two strategies named pseudo label distillation and track group denoising are designed to further improve the supervision for detection and association. Without the assistance of an extra detection network during inference, MOTRv3 achieves impressive performance across diverse benchmarks, e.g., MOT17, DanceTrack.
Abstract (translated)
尽管像MOTR这样的端到端多目标跟踪器享受简单的优点,但它们在检测和关联之间存在严重冲突,导致不满意的收敛动态。尽管MOTRv2部分解决了这个问题,但它需要额外的检测网络来进行协助。在这个工作中,我们是第一个揭示这个问题的人,发现这冲突在训练期间从检测询问和跟踪询问之间的不公平标签分配中产生,这些检测询问识别目标并将跟踪询问与之关联。基于这个观察,我们提出了MOTRv3,它使用开发的发布-查找监督策略平衡标签分配过程。在这个策略中,先释放标签用于检测,然后逐步回收用于关联。此外,我们还设计了另一个名为伪标签分解和跟踪组去噪的策略,以进一步提高检测和关联的监督。在没有额外的检测网络推理期间提供帮助的情况下,MOTRv3能够在各种基准上实现令人印象深刻的表现,例如MOT17和DanceTrack。
URL
https://arxiv.org/abs/2305.14298