Abstract
Smart data selection is becoming increasingly important in data-driven machine learning. Active learning offers a promising solution by allowing machine learning models to be effectively trained with optimal data including the most informative samples from large datasets. Wildlife data captured by camera traps are excessive in volume, requiring tremendous effort in data labelling and animal detection models training. Therefore, applying active learning to optimise the amount of labelled data would be a great aid in enabling automated wildlife monitoring and conservation. However, existing active learning techniques require that a machine learning model (i.e., an object detector) be fully accessible, limiting the applicability of the techniques. In this paper, we propose a model-agnostic active learning approach for detection of animals captured by camera traps. Our approach integrates uncertainty and diversity quantities of samples at both the object-based and image-based levels into the active learning sample selection process. We validate our approach in a benchmark animal dataset. Experimental results demonstrate that, using only 30% of the training data selected by our approach, a state-of-the-art animal detector can achieve a performance of equal or greater than that with the use of the complete training dataset.
Abstract (translated)
智能数据选择在数据驱动的机器学习中变得越来越重要。主动学习通过允许机器学习模型使用包括从大型数据集中挑选出最具信息量样本在内的最优数据进行有效训练,提供了一种有前景的解决方案。野生动物监测摄像头捕捉到的数据量庞大,需要大量的标签标注工作和动物检测模型训练的工作量。因此,在标记数据的数量优化方面应用主动学习将极大地促进自动化野生动物监控与保护工作的开展。然而,现有的主动学习技术要求机器学习模型(即对象检测器)完全可访问,这限制了这些技术的应用范围。 本文中我们提出了一种针对摄像头捕捉动物的检测任务所设计的、不受具体模型限制的主动学习方法。我们的方法在主动学习样本选择过程中结合了基于物体和图像水平上的不确定性和多样性的量度。我们在一个基准动物数据集上验证了该方法的有效性。实验结果表明,使用我们方法选定的仅占全部训练数据30%的数据,最先进的动物检测器即可达到与使用完整训练数据集相同或更高的性能水平。
URL
https://arxiv.org/abs/2507.06537