Abstract
Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.
Abstract (translated)
在许多实际应用中,活动识别是一个具有挑战性的问题。除了视觉特征外,最近的方法还得益于上下文的使用,例如活动和对象之间的相互关系。然而,这些方法需要标记数据,预先完全可用,而不是设计为连续更新,这使得它们不适合监视应用。相比之下,我们提出了一个连续的学习框架,用于从未标记的视频中识别上下文感知的活动,这比现有的方法有两个明显的优势。首先,它采用了一种新颖的主动学习技术,不仅利用了单个活动的信息性,而且在查询选择过程中还利用了它们的上下文信息;这将大大减少昂贵的手动注释工作。第二,随着更多的数据可用,可以在线调整学习的模型。我们建立了一个条件随机场模型,该模型对上下文进行编码,并设计了一种信息论方法,该方法利用节点的熵和互信息来计算由人标记的信息量最大的查询集。这些标签与图形推理技术相结合,用于增量更新。本文给出了主动学习框架的理论公式和分析解。对六个具有挑战性的数据集进行的实验表明,我们的框架在显著减少手动标记的情况下实现了卓越的性能。
URL
https://arxiv.org/abs/1904.04406