Abstract
The goal of Temporal Action Localization (TAL) is to find the categories and temporal boundaries of actions in an untrimmed video. Most TAL methods rely heavily on action recognition models that are sensitive to action labels rather than temporal boundaries. More importantly, few works consider the background frames that are similar to action frames in pixels but dissimilar in semantics, which also leads to inaccurate temporal boundaries. To address the challenge above, we propose a Boundary-Aware Proposal Generation (BAPG) method with contrastive learning. Specifically, we define the above background frames as hard negative samples. Contrastive learning with hard negative mining is introduced to improve the discrimination of BAPG. BAPG is independent of the existing TAL network architecture, so it can be applied plug-and-play to mainstream TAL models. Extensive experimental results on THUMOS14 and ActivityNet-1.3 demonstrate that BAPG can significantly improve the performance of TAL.
Abstract (translated)
时间动作定位(TAL)的目标是在未修剪的视频中找到行动类别和时间边界。大多数TAL方法都 heavily rely on 行动识别模型,这些模型对行动标签敏感,而不是时间边界。更重要的是,很少有工作考虑背景帧,它们在像素上与行动帧相似,但在语义上却不同,这也会导致不准确的时间边界。为了应对上述挑战,我们提出了一种带有对比学习的Boundary-Aware Proposal Generation (BAPG)方法。具体来说,我们定义上述背景帧为硬负样本。对比学习和硬负挖掘引入了,以改善BAPG的区分度。BAPG与现有TAL网络架构无关,因此可以将其轻松应用于主流TAL模型。在THUMOS14和ActivityNet-1.3的实验中,广泛证明了BAPG能够显著改善TAL性能。
URL
https://arxiv.org/abs/2309.13810