Paper Reading AI Learner

Learning from Synthetic Data for Crowd Counting in the Wild

2019-03-08 06:40:50
Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan

Abstract

Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model's performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.

Abstract (translated)

近年来,由于其广泛的应用(如视频监控、公共安全等),计算人群场景的人数是一个热门话题。这是野外的一项艰巨任务:环境多变,人员众多,导致目前的方法不能很好地工作。此外,由于数据的稀缺,许多方法都存在着不同程度的过度拟合问题。为了解决这两个问题,首先,我们开发了一个数据收集器和贴标机,它可以生成合成的人群场景,并在不需要任何人力的情况下对其进行注释。在此基础上,我们构建了一个大规模、多样化的合成数据集。其次,我们提出了两种利用合成数据提高野外群体计数性能的方案:1)在合成数据上预先设定群体计数器,然后利用真实数据对其进行微调,显著提高了模型对真实数据的性能;2)提出了一种基于域自适应的群体计数方法,该方法可以自由地进行群体计数。来自大量数据注释的人类。大量实验表明,第一种方法在四个真实数据集上实现了最先进的性能,第二种方法优于基线。数据集和源代码可以在https://gjy3035.github.io/gcc-cl/上找到。

URL

https://arxiv.org/abs/1903.03303

PDF

https://arxiv.org/pdf/1903.03303.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot