PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper introduces a novel approach to temporal action localization (TAL) in few-shot learning. Our work addresses the inherent limitations of conventional single-prompt learning methods that often lead to overfitting due to the inability to generalize across varying contexts in real-world videos. Recognizing the diversity of camera views, backgrounds, and objects in videos, we propose a multi-prompt learning framework enhanced with optimal transport. This design allows the model to learn a set of diverse prompts for each action, capturing general characteristics more effectively and distributing the representation to mitigate the risk of overfitting. Furthermore, by employing optimal transport theory, we efficiently align these prompts with action features, optimizing for a comprehensive representation that adapts to the multifaceted nature of video data. Our experiments demonstrate significant improvements in action localization accuracy and robustness in few-shot settings on the standard challenging datasets of THUMOS-14 and EpicKitchens100, highlighting the efficacy of our multi-prompt optimal transport approach in overcoming the challenges of conventional few-shot TAL methods.

Abstract (translated)

本文提出了一种在少量样本学习中的新颖的时间动作局部化（TAL）方法。我们的工作解决了传统单提示学习方法在无法跨越现实视频中的各种上下文进行泛化的问题。认识到视频中相机视角、背景和对象存在的多样性，我们提出了一个带有最优传输的多提示学习框架。这种设计允许模型学习每个动作的一组多样性提示，更有效地捕捉通用特征并分散表示以降低过拟合风险。此外，通过采用最优传输理论，我们有效地将这些提示与动作特征对齐，优化全面表示以适应视频数据的多样化特点。我们的实验结果表明，在THUMOS-14标准和EpicKitchens100等具有挑战性的数据集上，动作局部化精度和鲁棒性在少量样本设置中得到了显著提高，这充分证明了我们在多提示最优传输方法中克服了传统少量样本TAL方法的挑战。

URL

https://arxiv.org/abs/2403.18915

PDF

https://arxiv.org/pdf/2403.18915.pdf

PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Abstract

Abstract (translated)

URL

PDF Copy

PDF