Abstract
This work addresses the unsupervised adaptation of an existing object detector to a new target domain. We assume that a large number of unlabeled videos from this domain are readily available. We automatically obtain labels on the target data by using high-confidence detections from the existing detector, augmented with hard (misclassified) examples acquired by exploiting temporal cues using a tracker. These automatically-obtained labels are then used for re-training the original model. A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain. Our approach is empirically evaluated on challenging face and pedestrian detection tasks: a face detector trained on WIDER-Face, which consists of high-quality images crawled from the web, is adapted to a large-scale surveillance data set; a pedestrian detector trained on clear, daytime images from the BDD-100K driving data set is adapted to all other scenarios such as rainy, foggy, night-time. Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.
Abstract (translated)
这项工作解决了现有目标探测器对新目标域的无监督适应。我们假设这个域中有大量未标记的视频随时可用。通过使用现有探测器的高置信度检测,我们自动获得目标数据上的标签,并通过使用跟踪器利用时间线索获取硬(错误分类)示例进行增强。然后,这些自动获得的标签用于对原始模型进行重新培训。提出了一种改进的知识蒸馏损失模型,研究了在目标域为训练实例分配软标签的几种方法。我们的方法是对具有挑战性的人脸和行人检测任务进行经验评估的:一个在较宽的人脸上训练的人脸检测仪,由从网络上获取的高质量图像组成,适用于大规模的监测数据集;一个在BDD-100K驾驶数据集的白天清晰图像上训练的行人检测仪适用于其他场景,如雨天、雾天、夜间。我们的结果证明了合并从跟踪中获得的硬例子的有效性,通过蒸馏损失使用软标签相对于硬标签的优势,并且显示出作为一种简单的无监督的对象检测器域自适应方法的良好性能,对超参数的依赖性最小。
URL
https://arxiv.org/abs/1904.07305