JCDNet: Joint of Common and Definite phases Network for Weakly Supervised Temporal Action Localization

Abstract
Abstract (translated)
URL
PDF

Abstract

Weakly-supervised temporal action localization aims to localize action instances in untrimmed videos with only video-level supervision. We witness that different actions record common phases, e.g., the run-up in the HighJump and LongJump. These different actions are defined as conjoint actions, whose rest parts are definite phases, e.g., leaping over the bar in a HighJump. Compared with the common phases, the definite phases are more easily localized in existing researches. Most of them formulate this task as a Multiple Instance Learning paradigm, in which the common phases are tended to be confused with the background, and affect the localization completeness of the conjoint actions. To tackle this challenge, we propose a Joint of Common and Definite phases Network (JCDNet) by improving feature discriminability of the conjoint actions. Specifically, we design a Class-Aware Discriminative module to enhance the contribution of the common phases in classification by the guidance of the coarse definite-phase features. Besides, we introduce a temporal attention module to learn robust action-ness scores via modeling temporal dependencies, distinguishing the common phases from the background. Extensive experiments on three datasets (THUMOS14, ActivityNetv1.2, and a conjoint-action subset) demonstrate that JCDNet achieves competitive performance against the state-of-the-art methods. Keywords: weakly-supervised learning, temporal action localization, conjoint action

Abstract (translated)

弱监督的时间行动定位旨在在未剪辑的视频上仅使用视频级别的监督将行动实例Localization。我们观察到，不同的行动记录了共同阶段，例如高跳和长跳的起跳阶段，这些不同行动被定义为联合行动，其Rest部分是确定的阶段，例如在高跳中跃过横杆。与共同阶段相比，确定的阶段更容易在现有的研究中Localization。大多数研究人员将这一任务定义为多个实例学习范式，其中共同阶段倾向于与背景混淆，并影响联合行动的Localization完整度。为了应对这一挑战，我们提出了一个共同和确定阶段网络(JCDNet)，通过提高联合行动的特征区分能力来改善其特征分类能力。具体来说，我们设计了一个类aware的分类模块，以增强分类中共同阶段的贡献，通过指导粗确定的阶段特征的指导。此外，我们引入了时间注意模块，通过建模时间依赖来学习稳定的行动特性得分，从背景中区分共同阶段。对三个数据集(THUMOS14、ActivityNetv1.2和联合行动子集)进行的广泛实验表明，JCDNet在与现有方法竞争时实现了出色的性能。关键词：弱监督学习，时间行动定位，联合行动

URL

https://arxiv.org/abs/2303.17294

PDF

https://arxiv.org/pdf/2303.17294.pdf