Abstract
Owing to its rapid progress and broad application prospects, few-shot action recognition has attracted considerable interest. However, current methods are predominantly based on limited single-modal data, which does not fully exploit the potential of multimodal information. This paper presents a novel framework that actively identifies reliable modalities for each sample using task-specific contextual cues, thus significantly improving recognition performance. Our framework integrates an Active Sample Inference (ASI) module, which utilizes active inference to predict reliable modalities based on posterior distributions and subsequently organizes them accordingly. Unlike reinforcement learning, active inference replaces rewards with evidence-based preferences, making more stable predictions. Additionally, we introduce an active mutual distillation module that enhances the representation learning of less reliable modalities by transferring knowledge from more reliable ones. Adaptive multimodal inference is employed during the meta-test to assign higher weights to reliable modalities. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing approaches.
Abstract (translated)
由于其迅速的发展和广泛的应用前景,少样本动作识别吸引了大量的关注。然而,当前的方法主要依赖于有限的单模态数据,未能充分利用多模态信息的潜力。本文提出了一种新颖的框架,该框架能够利用特定任务相关的上下文线索主动识别每个样本中的可靠模态,从而显著提高识别性能。我们的框架整合了一个积极样本推理(Active Sample Inference, ASI)模块,通过使用基于后验分布的积极推断来预测可靠的模态,并随后根据这些模态进行组织。与强化学习不同的是,积极推断用基于证据的偏好取代了奖励机制,从而能够做出更加稳定的预测。 此外,我们还引入了一个主动互蒸馏(active mutual distillation)模块,通过从更可靠的模态中转移知识来增强不那么可靠模态的表现学习能力。在元测试阶段采用自适应多模态推理技术,给定更多的权重于可靠的模态上。多项跨多个基准的实验表明,我们的方法显著优于现有方法。 总的来说,该研究提出了一种新颖的方法,在少样本动作识别领域利用多模态信息提高了模型性能,并展示了其优越性和潜力。
URL
https://arxiv.org/abs/2506.13322