Paper Reading AI Learner

Context-Aware Query Selection for Active Learning in Event Recognition

2019-04-09 00:58:23
Mahmudul Hasan, Sujoy Paul, Anastasios I. Mourikis, Amit K. Roy-Chowdhury

Abstract

Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.

Abstract (translated)

在许多实际应用中,活动识别是一个具有挑战性的问题。除了视觉特征外,最近的方法还得益于上下文的使用,例如活动和对象之间的相互关系。然而,这些方法需要标记数据,预先完全可用,而不是设计为连续更新,这使得它们不适合监视应用。相比之下,我们提出了一个连续的学习框架,用于从未标记的视频中识别上下文感知的活动,这比现有的方法有两个明显的优势。首先,它采用了一种新颖的主动学习技术,不仅利用了单个活动的信息性,而且在查询选择过程中还利用了它们的上下文信息;这将大大减少昂贵的手动注释工作。第二,随着更多的数据可用,可以在线调整学习的模型。我们建立了一个条件随机场模型,该模型对上下文进行编码,并设计了一种信息论方法,该方法利用节点的熵和互信息来计算由人标记的信息量最大的查询集。这些标签与图形推理技术相结合,用于增量更新。本文给出了主动学习框架的理论公式和分析解。对六个具有挑战性的数据集进行的实验表明,我们的框架在显著减少手动标记的情况下实现了卓越的性能。

URL

https://arxiv.org/abs/1904.04406

PDF

https://arxiv.org/pdf/1904.04406.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot