Abstract
Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have different scene distributions from the training samples. To mitigate this problem, we propose a novel method, called Scene-debiasing Open-set Action Recognition (SOAR), which features an adversarial scene reconstruction module and an adaptive adversarial scene classification module. The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning. The latter aims to confuse scene type classification given video features, with a specific emphasis on the action foreground, and helps to learn scene-invariant information. In addition, we design an experiment to quantify the scene bias. The results indicate that the current open-set action recognizers are biased toward the scene, and our proposed SOAR method better mitigates such bias. Furthermore, our extensive experiments demonstrate that our method outperforms state-of-the-art methods, and the ablation studies confirm the effectiveness of our proposed modules.
Abstract (translated)
深度学习模型有利用伪线索进行预测的风险,例如基于背景场景识别动作。当测试样本与训练样本的场景分布不同时,这种问题会对开放集动作识别性能造成严重的影响。为了解决这个问题,我们提出了一种新方法,称为场景去偏差的开放集动作识别(SOAR),它包括一个对抗场景重建模块和一个自适应对抗场景分类模块。前者防止解码器根据视频特征重构视频背景,从而有助于减少特征学习中的背景信息。后者旨在根据视频特征对场景进行分类,特别注重动作的前端,并有助于学习场景不变的信息。此外,我们设计了一项实验来量化场景偏差。结果显示,当前开放集动作识别器存在偏向场景的倾向,而我们提出的SOAR方法更好地克服了这种偏差。此外,我们的广泛实验表明,我们的方法比当前的方法表现更好,而削除研究证实了我们提出的模块的 effectiveness。
URL
https://arxiv.org/abs/2309.01265