Improving Action Localization by Progressive Cross-stream Cooperation

Abstract
Abstract (translated)
URL
PDF

Abstract

Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to use both region proposals and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively improve action localization results and generate better bounding boxes in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by focusing on "confusing" samples from the same action class. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.

Abstract (translated)

时空动作定位包括三个层次的任务：空间定位、动作分类和时间分割。在这项工作中，我们提出了一个新的渐进式跨流合作（PCSC）框架，利用来自一个流（即流/RGB）的区域建议和特性来帮助另一个流（即流/RGB）迭代地改进动作本地化结果，并以迭代的方式生成更好的边界框。具体来说，我们首先通过结合来自两个流的最新区域建议生成一组更大的区域建议，从中我们可以轻松地获得一组更大的带标签的培训样本，以帮助学习更好的行动检测模型。其次，我们还提出了一种新的消息传递方法，将信息从一个流传递到另一个流，以学习更好的表示，这也导致了更好的动作检测模型。因此，我们的迭代框架逐步改进了框架级别的动作本地化结果。为了提高视频级的动作定位结果，我们还提出了一种新的策略，训练特定于类的动作检测器，以便更好地进行时间分割，这可以通过关注来自同一动作类的“混淆”样本来容易地学习。在两个基准数据集UCF-101-24和J-HMDB上进行的综合实验证明了我们新提出的方法在现实场景中时空动作定位的有效性。

URL

https://arxiv.org/abs/1905.11575

PDF

https://arxiv.org/pdf/1905.11575.pdf