Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

Abstract
Abstract (translated)
URL
PDF

Abstract

Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{this https URL}.

Abstract (translated)

视频时序行为定位在计算机视觉领域面临巨大的挑战。虽然边界敏感方法已经被广泛采用,但它的限制包括不完整的使用中间和全球信息,以及高效的提议特征生成方法。为了解决这些问题,我们提出了一个 novel 框架,Sparse Multilevel Boundary Generator (SMBG),它通过边界分类和行动完整性回归增强边界敏感方法。SMBG 采用多层次边界模块,通过收集不同长度的边界信息实现更快的处理。此外,我们引入了稀疏提取自信头,它区别内外行动信息,进一步优化提议特征生成方法。为了改善多个分支之间的协同作用并平衡正面和负面样本,我们提出了全球指导损失。我们的方法在两个流行的基准测试中进行评估,分别是ActivityNet-1.3和THUMOS14,并显示实现最先进的性能,推理速度更快(2.47xBSN++,2.12xDBG)。这些结果表明,SMBG 提供了更有效和简单的时序行为提议生成方法。我们提出的框架有潜力推进计算机视觉领域,提高视频分析中的时序行为定位准确性和速度。代码和模型可在 url{this https URL} 上提供。

URL

https://arxiv.org/abs/2303.03166

PDF

https://arxiv.org/pdf/2303.03166.pdf