Object detection is one of the most important and fundamental aspects of computer vision tasks, which has been broadly utilized in pose estimation, object tracking and instance segmentation models. To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format and the annotator needs to draw a bounding box around each object in the images. Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from. How to select the most informative frames from a video to annotate has become a highly practical task to solve but attracted little attention in research. In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem. In the proposed active learning algorithm, both classification and localization informativeness of unlabelled data are measured and aggregated. Utilizing the temporal information from video frames, two novel localization informativeness measurements are proposed. Furthermore, a weight curve is proposed to avoid querying adjacent frames. Proposed active learning algorithm with multiple configurations was evaluated on the MuPoTS dataset and FootballPD dataset.
对象检测是计算机视觉任务中最为重要的和基本方面之一,在姿态估计、物体跟踪和实例分割模型中被广泛应用。为了高效地获取对象检测模型的训练数据,许多数据集选择以视频格式获取未标注数据,标注者需要在每个图像中画一个边界框来包围每个对象。对每个视频帧进行标注非常昂贵且效率低,因为许多帧包含模型可学习的信息。如何选择视频中最有用的帧来标注已成为一个非常实际的问题,但研究人员对此问题的关注较少。在本文中,我们提出了一种针对对象检测模型的新主动学习算法来解决此问题。在所提出的主动学习算法中,未标注数据的分类和定位 informativeness 都被测量和聚合。利用视频帧的时序信息,我们提出了两个新的定位 informativeness 测量方法。此外,我们提出了一种权重曲线,以避免询问相邻帧。所提出的多种配置的主动学习算法在MuPoTS数据和足球PD数据集上进行了评估。