Paper Reading AI Learner

Rt-Track: Robust Tricks for Multi-Pedestrian Tracking

2023-03-16 22:08:29
Yukuan Zhang, Yunhua Jia, Housheng Xie, Mengzhen Li, Limin Zhao, Yang Yang, Shan Zhao

Abstract

Object tracking is divided into single-object tracking (SOT) and multi-object tracking (MOT). MOT aims to maintain the identities of multiple objects across a series of continuous video sequences. In recent years, MOT has made rapid progress. However, modeling the motion and appearance models of objects in complex scenes still faces various challenging issues. In this paper, we design a novel direction consistency method for smooth trajectory prediction (STP-DC) to increase the modeling of motion information and overcome the lack of robustness in previous methods in complex scenes. Existing methods use pedestrian re-identification (Re-ID) to model appearance, however, they extract more background information which lacks discriminability in occlusion and crowded scenes. We propose a hyper-grain feature embedding network (HG-FEN) to enhance the modeling of appearance models, thus generating robust appearance descriptors. We also proposed other robustness techniques, including CF-ECM for storing robust appearance information and SK-AS for improving association accuracy. To achieve state-of-the-art performance in MOT, we propose a robust tracker named Rt-track, incorporating various tricks and techniques. It achieves 79.5 MOTA, 76.0 IDF1 and 62.1 HOTA on the test set of MOT17.Rt-track also achieves 77.9 MOTA, 78.4 IDF1 and 63.3 HOTA on MOT20, surpassing all published methods.

Abstract (translated)

对象跟踪可以分为单对象跟踪(SOT)和多对象跟踪(MOT)。MOT的目标是在一系列连续视频序列中维持多个物体的身份。近年来,MOT取得了迅速进展。然而,在复杂场景中建模物体的运动和外观模型仍然面临各种挑战。在本文中,我们设计了一种平滑路径预测的新方向一致性方法(STP-DC),以提高运动信息的建模能力,并克服在复杂场景中之前方法的缺乏可靠性。现有方法使用人名识别(Re-ID)来建模外观,但是它们提取更多的背景信息,在遮挡和拥挤场景中缺乏分辨性。我们提出了一种超颗粒特征嵌入网络(HG-FEN)来增强外观模型的建模能力,从而生成可靠的外观描述符。我们还提出了其他可靠性技术,包括存储可靠的外观信息的实验方法CF-ECM和提高关联准确性的SK-AS。为了在MOT中实现最先进的性能,我们提出了名为Rt-track的可靠跟踪器,综合各种技巧和方法。它在MOT17测试集上实现了79.5 MOTA、76.0 IDF1和62.1 HOTA。Rt-track还在MOT20上实现了77.9 MOTA、78.4 IDF1和63.3 HOTA,超越了所有公开方法。

URL

https://arxiv.org/abs/2303.09668

PDF

https://arxiv.org/pdf/2303.09668.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot