Abstract
Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task. Due to large-scale datasets, most existing methods use a network pretrained in other datasets to extract features, which are not suitable enough for WTAL. To address this problem, researchers design several modules for feature enhancement, which improve the performance of the localization module, especially modeling the temporal relationship between snippets. However, all of them neglect the adverse effects of ambiguous information, which would reduce the discriminability of others. Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations. Additionally, we propose feature consistency loss to prevent the assimilation of features and drive the graph convolution network to generate more discriminative representations. Extensive experiments on THUMOS14 and ActivityNet1.2 benchmarks demonstrate the effectiveness of DDG-Net, establishing new state-of-the-art results on both datasets. Source code is available at \url{this https URL}.
Abstract (translated)
弱监督的时间行动定位(WTAL)是一个实用但具有挑战性的任务。由于大规模的数据集,大多数现有方法使用在其他数据集中训练的网络提取特征,这些特征不适合用于WTAL。为了解决这一问题,研究人员设计了几个特征增强模块,以提高定位模块的性能,特别是建模片段之间的时间关系。然而,他们都忽略了歧义信息的副作用,这将会减少其他人的区分能力。考虑到这种现象,我们提出了区分性驱动的 Graph 网络(DDG-Net),它 explicitly 建模歧义片段和有用的片段,采用设计良好的连接,防止传输歧义信息,并增强片段级表示的区分能力。此外,我们提出了特征一致性损失,以防止特征融合并推动Graph卷积网络生成更多的有用表示。在THUMOS14和ActivityNet1.2基准数据上的广泛实验证明了DDG-Net的有效性,在两个数据集上实现了新的最先进的结果。源代码可在 \url{this https URL} 找到。
URL
https://arxiv.org/abs/2307.16415