Paper Reading AI Learner

Active Learning for Video Classification with Frame Level Queries

2023-07-10 15:47:13
Debanjan Goswami, Shayok Chakraborty

Abstract

Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.

Abstract (translated)

深度学习算法已经推动了计算机视觉研究的 boundaries,并在多种应用中展现出了出色的表现。然而,训练一个稳健的深度学习神经网络需要大量的标记训练数据,获取这些数据需要耗费大量时间和人力资源。对于像视频分类这样的应用,人类标注者需要整段视频逐帧地标注以生成标签。 Active learning算法自动从大量未标记的数据中识别出最有用的样本;这极大地减少了人类标注者生成机器学习模型所需的人力工作量,因为只有被算法识别出的样本需要手动标注。在本文中,我们提出了一种针对视频分类的新颖主动学习框架,旨在进一步减少人类标注者的工作量。我们的框架识别一组示范视频,并每个视频一组有用的帧;人类标注者只需要审查帧并为每个视频提供标签。这比整段视频标注所需的手动工作要少得多。我们基于不确定性和多样性制定了一个标准,以识别有用的视频,并利用代表性抽样技术从每个视频中提取一组示范帧。据我们所知,这是开发视频分类主动学习框架的第一项研究工作,其中人类标注者只需要检查少数帧以生成标签,而不是整段视频。

URL

https://arxiv.org/abs/2307.05587

PDF

https://arxiv.org/pdf/2307.05587.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot