Abstract
Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.
Abstract (translated)
深度学习算法已经推动了计算机视觉研究的 boundaries,并在多种应用中展现出了出色的表现。然而,训练一个稳健的深度学习神经网络需要大量的标记训练数据,获取这些数据需要耗费大量时间和人力资源。对于像视频分类这样的应用,人类标注者需要整段视频逐帧地标注以生成标签。 Active learning算法自动从大量未标记的数据中识别出最有用的样本;这极大地减少了人类标注者生成机器学习模型所需的人力工作量,因为只有被算法识别出的样本需要手动标注。在本文中,我们提出了一种针对视频分类的新颖主动学习框架,旨在进一步减少人类标注者的工作量。我们的框架识别一组示范视频,并每个视频一组有用的帧;人类标注者只需要审查帧并为每个视频提供标签。这比整段视频标注所需的手动工作要少得多。我们基于不确定性和多样性制定了一个标准,以识别有用的视频,并利用代表性抽样技术从每个视频中提取一组示范帧。据我们所知,这是开发视频分类主动学习框架的第一项研究工作,其中人类标注者只需要检查少数帧以生成标签,而不是整段视频。
URL
https://arxiv.org/abs/2307.05587