Paper Reading AI Learner

JCDNet: Joint of Common and Definite phases Network for Weakly Supervised Temporal Action Localization

2023-03-30 11:09:02
Yifu Liu, Xiaoxia Li, Zhiling Luo, Wei Zhou

Abstract

Weakly-supervised temporal action localization aims to localize action instances in untrimmed videos with only video-level supervision. We witness that different actions record common phases, e.g., the run-up in the HighJump and LongJump. These different actions are defined as conjoint actions, whose rest parts are definite phases, e.g., leaping over the bar in a HighJump. Compared with the common phases, the definite phases are more easily localized in existing researches. Most of them formulate this task as a Multiple Instance Learning paradigm, in which the common phases are tended to be confused with the background, and affect the localization completeness of the conjoint actions. To tackle this challenge, we propose a Joint of Common and Definite phases Network (JCDNet) by improving feature discriminability of the conjoint actions. Specifically, we design a Class-Aware Discriminative module to enhance the contribution of the common phases in classification by the guidance of the coarse definite-phase features. Besides, we introduce a temporal attention module to learn robust action-ness scores via modeling temporal dependencies, distinguishing the common phases from the background. Extensive experiments on three datasets (THUMOS14, ActivityNetv1.2, and a conjoint-action subset) demonstrate that JCDNet achieves competitive performance against the state-of-the-art methods. Keywords: weakly-supervised learning, temporal action localization, conjoint action

Abstract (translated)

弱监督的时间行动定位旨在在未剪辑的视频上仅使用视频级别的监督将行动实例Localization。我们观察到,不同的行动记录了共同阶段,例如高跳和长跳的起跳阶段,这些不同行动被定义为联合行动,其Rest部分是确定的阶段,例如在高跳中跃过横杆。与共同阶段相比,确定的阶段更容易在现有的研究中Localization。大多数研究人员将这一任务定义为多个实例学习范式,其中共同阶段倾向于与背景混淆,并影响联合行动的Localization完整度。为了应对这一挑战,我们提出了一个共同和确定阶段网络(JCDNet),通过提高联合行动的特征区分能力来改善其特征分类能力。具体来说,我们设计了一个类aware的分类模块,以增强分类中共同阶段的贡献,通过指导粗确定的阶段特征的指导。此外,我们引入了时间注意模块,通过建模时间依赖来学习稳定的行动特性得分,从背景中区分共同阶段。对三个数据集(THUMOS14、ActivityNetv1.2和联合行动子集)进行的广泛实验表明,JCDNet在与现有方法竞争时实现了出色的性能。关键词:弱监督学习,时间行动定位,联合行动

URL

https://arxiv.org/abs/2303.17294

PDF

https://arxiv.org/pdf/2303.17294.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot