Paper Reading AI Learner

Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

2023-03-06 14:26:56
Qing Song, Yang Zhou, Mengjie Hu, Chun Liu

Abstract

Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{this https URL}.

Abstract (translated)

视频时序行为定位在计算机视觉领域面临巨大的挑战。虽然边界敏感方法已经被广泛采用,但它的限制包括不完整的使用中间和全球信息,以及高效的提议特征生成方法。为了解决这些问题,我们提出了一个 novel 框架,Sparse Multilevel Boundary Generator (SMBG),它通过边界分类和行动完整性回归增强边界敏感方法。SMBG 采用多层次边界模块,通过收集不同长度的边界信息实现更快的处理。此外,我们引入了稀疏提取自信头,它区别内外行动信息,进一步优化提议特征生成方法。为了改善多个分支之间的协同作用并平衡正面和负面样本,我们提出了全球指导损失。我们的方法在两个流行的基准测试中进行评估,分别是ActivityNet-1.3和THUMOS14,并显示实现最先进的性能,推理速度更快(2.47xBSN++,2.12xDBG)。这些结果表明,SMBG 提供了更有效和简单的时序行为提议生成方法。我们提出的框架有潜力推进计算机视觉领域,提高视频分析中的时序行为定位准确性和速度。代码和模型可在 url{this https URL} 上提供。

URL

https://arxiv.org/abs/2303.03166

PDF

https://arxiv.org/pdf/2303.03166.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot