Abstract
Temporal action localization has recently attracted significant interest in the Computer Vision community. However, despite the great progress, it is hard to identify which aspects of the proposed methods contribute most to the increase in localization performance. To address this issue, we conduct ablative experiments on feature extraction methods, fixed-size feature representation methods and training strategies, and report how each influences the overall performance. Based on our findings, we propose a two-stage detector that outperforms the state of the art in THUMOS14, achieving a mAP@tIoU=0.5 equal to 44.2%.
Abstract (translated)
时间动作定位最近引起了计算机视觉界的极大兴趣。然而,尽管取得了很大进展,但很难确定所提出方法的哪些方面对提高本地化性能贡献最大。为了解决这一问题,我们对特征提取方法、固定尺寸特征表示方法和训练策略进行了烧蚀实验,并报告了每种方法对整体性能的影响。根据我们的研究结果,我们提出了一种两级探测器,其性能优于14日星期四的最新技术,在tiou=0.5时达到了44.2%。
URL
https://arxiv.org/abs/1905.10608