Abstract
Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in crowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.
Abstract (translated)
现代的人群计数方法通常采用深度神经网络(DNN)通过密度回归估计人群数量。尽管这些方法有了显著的改进,但是基于回归的方法无法提供对人群中个体的检测。另一方面,由于需要昂贵的边界框注释,基于检测的方法在最近的人群计数趋势中还没有得到广泛的探索。在这项工作中,我们提出了一个新的深度检测网络,只需要点监控。它可以同时检测人类头部的大小和位置,并在人群中计数。我们首先从点级注释中挖掘有用的人员规模信息,并初始化伪地面真值边界框。在训练过程中引入了一种在线更新方案来改进伪地面真值,同时设计了一种局部约束回归损失来对局部邻域中预测盒的大小提供额外的约束。最后,我们提出了一种课程学习策略,首先从相对准确和容易的伪地面真实图像训练网络。在上海科技、UCF-CC-U50、Wideface和Trancos数据集等多个标准基准上对检测和计数任务进行了广泛的实验,结果表明我们的方法优于最新技术。
URL
https://arxiv.org/abs/1904.01333