Paper Reading AI Learner

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

2023-08-24 07:19:11
Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong Sang

Abstract

Point-supervised Temporal Action Localization (PSTAL) is an emerging research direction for label-efficient learning. However, current methods mainly focus on optimizing the network either at the snippet-level or the instance-level, neglecting the inherent reliability of point annotations at both levels. In this paper, we propose a Hierarchical Reliability Propagation (HR-Pro) framework, which consists of two reliability-aware stages: Snippet-level Discrimination Learning and Instance-level Completeness Learning, both stages explore the efficient propagation of high-confidence cues in point annotations. For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class. We then employ a Reliability-aware Attention Block to capture both intra-video and inter-video dependencies of snippets, resulting in more discriminative and robust snippet representation. For instance-level learning, we propose a point-based proposal generation approach as a means of connecting snippets and instances, which produces high-confidence proposals for further optimization at the instance level. Through multi-level reliability-aware learning, we obtain more reliable confidence scores and more accurate temporal boundaries of predicted proposals. Our HR-Pro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average mAP of 60.3% on THUMOS14. Notably, our HR-Pro largely surpasses all previous point-supervised methods, and even outperforms several competitive fully supervised methods. Code will be available at this https URL.

Abstract (translated)

Point-supervised Temporal Action Localization (PSTAL) 是一种高效的标签学习新研究方向。然而,目前的方法主要关注片段级别或实例级别的网络优化,而忽视了点标注在这两个级别的固有可靠性。在本文中,我们提出了一种Hierarchical Reliability Propagation (HR-Pro)框架,该框架包括两个可靠性意识的不同阶段:片段级别的歧视性和实例级别的完整度学习,这两个阶段研究了点标注的高可靠性信号的有效传播。对于片段级别的学习,我们引入了在线更新的记忆来存储每个类别的可靠片段原型。然后,我们使用可靠性意识的注意力块捕获片段之间的内部视频和外部视频依赖关系,从而生成更具个性和鲁棒性的片段表示。对于实例级别的学习,我们提出了基于点的建议生成方法,以连接片段和实例,并产生在实例级别的进一步优化的高可靠性建议。通过多级可靠性意识学习,我们获得了更可靠的信心评分和更准确的预测建议的时间边界。我们的HR-Pro在多个具有挑战性的基准测试中取得了最先进的表现,包括在THUMOS14上令人印象深刻的平均mAP为60.3%。值得注意的是,我们的HR-Pro几乎超越了所有以前的点标注方法,甚至超越了一些竞争完全监督方法。代码将在这个httpsURL上提供。

URL

https://arxiv.org/abs/2308.12608

PDF

https://arxiv.org/pdf/2308.12608.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot