Paper Reading AI Learner

Deep Multi-Shot Network for modelling Appearance Similarity in Multi-Person Tracking applications

2020-04-07 16:43:35
María J. Gómez-Silva

Abstract

The automatization of Multi-Object Tracking becomes a demanding task in real unconstrained scenarios, where the algorithms have to deal with crowds, crossing people, occlusions, disappearances and the presence of visually similar individuals. In those circumstances, the data association between the incoming detections and their corresponding identities could miss some tracks or produce identity switches. In order to reduce these tracking errors, and even their propagation in further frames, this article presents a Deep Multi-Shot neural model for measuring the Degree of Appearance Similarity (MS-DoAS) between person observations. This model provides temporal consistency to the individuals' appearance representation, and provides an affinity metric to perform frame-by-frame data association, allowing online tracking. The model has been deliberately trained to be able to manage the presence of previous identity switches and missed observations in the handled tracks. With that purpose, a novel data generation tool has been designed to create training tracklets that simulate such situations. The model has demonstrated a high capacity to discern when a new observation corresponds to a certain track, achieving a classification accuracy of 97\% in a hard test that simulates tracks with previous mistakes. Moreover, the tracking efficiency of the model in a Surveillance application has been demonstrated by integrating that into the frame-by-frame association of a Tracking-by-Detection algorithm.

Abstract (translated)

URL

https://arxiv.org/abs/2004.03531

PDF

https://arxiv.org/pdf/2004.03531.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot