Abstract
Multiple-object tracking and behavior analysis have been the essential parts of surveillance video analysis for public security and urban management. With billions of surveillance video captured all over the world, multiple-object tracking and behavior analysis by manual labor are cumbersome and cost expensive. Due to the rapid development of deep learning algorithms in recent years, automatic object tracking and behavior analysis put forward an urgent demand on a large scale well-annotated surveillance video dataset that can reflect the diverse, congested, and complicated scenarios in real applications. This paper introduces an urban surveillance video dataset (USVD) which is by far the largest and most comprehensive. The dataset consists of 16 scenes captured in 7 typical outdoor scenarios: street, crossroads, hospital entrance, school gate, park, pedestrian mall, and public square. Over 200k video frames are annotated carefully, resulting in more than 3:7 million object bounding boxes and about 7:1 thousand trajectories. We further use this dataset to evaluate the performance of typical algorithms for multiple-object tracking and anomaly behavior analysis and explore the robustness of these methods in urban congested scenarios.
Abstract (translated)
多目标跟踪和行为分析是城市治安管理监控视频分析的重要组成部分。由于全球范围内捕获了数以十亿计的监控视频,人工进行多目标跟踪和行为分析既麻烦又昂贵。近年来,随着深度学习算法的迅速发展,自动目标跟踪和行为分析对大规模、注释性好的监控视频数据集提出了迫切的需求,该数据集能够反映实际应用中的各种、拥挤和复杂场景。本文介绍了目前为止规模最大、综合性最强的城市监控视频数据集(USVD)。数据集由7个典型室外场景中捕获的16个场景组成:街道、十字路口、医院入口、学校大门、公园、步行街和公共广场。超过20万个视频帧被仔细标注,产生了超过370万个对象边界框和大约7:1千个轨迹。进一步利用该数据集对多目标跟踪和异常行为分析的典型算法进行了性能评估,并探讨了这些方法在城市拥挤情况下的鲁棒性。
URL
https://arxiv.org/abs/1904.11784