Abstract
Video memorability refers to the ability of videos to be recalled after viewing, playing a crucial role in creating content that remains memorable. Existing models typically focus on extracting multimodal features to predict video memorability scores but often fail to fully utilize motion cues. The representation of motion features is compromised during the fine-tuning phase of the motion feature extractor due to a lack of labeled data. In this paper, we introduce the Text-Motion Cross-modal Contrastive Loss (TMCCL), a multimodal video memorability prediction model designed to enhance the representation of motion features. We tackle the challenge of improving motion feature representation by leveraging text description similarities across videos to establish positive and negative motion sample sets for a given target. This enhancement allows the model to learn similar feature representations for semantically related motion content, resulting in more accurate memorability predictions. Our model achieves state-of-the-art performance on two video memorability prediction datasets. Moreover, the potential applications of video memorability prediction have been underexplored. To address this gap, we present Memorability Weighted Correction for Video Summarization (MWCVS), using video memorability prediction to reduce subjectivity in video summarization labels. Experimental results on two video summarization datasets demonstrate the effectiveness of MWCVS, showcasing the promising applications of video memorability prediction.
Abstract (translated)
视频记忆性是指视频在观看后能够被回忆起来的能力,在创建令人难忘的内容方面发挥着关键作用。现有模型通常侧重于提取多模态特征以预测视频的记忆性得分,但往往未能充分利用运动线索。由于缺乏标注数据,在对运动特征抽取器进行微调时,会损害运动特征的表示能力。 在本文中,我们引入了文本-运动跨模态对比损失(Text-Motion Cross-modal Contrastive Loss, TMCCL),这是一种多模态视频记忆性预测模型,旨在增强运动特征的表现形式。通过利用不同视频间文本描述的相似性来为给定目标建立正负运动样本集,以应对提升运动特征表示能力的挑战。这种改进使得模型能够学习到语义相关运动内容的类似特征表示,从而实现更准确的记忆性预测。 我们的模型在两个视频记忆性预测数据集中达到了当前最佳性能。此外,对视频记忆性预测潜在应用的研究尚处于起步阶段。为弥补这一不足,我们提出了基于视频记忆性预测来减少视频摘要标签主观性的记忆加权校正(Memorability Weighted Correction for Video Summarization, MWCVS)。在两个视频摘要数据集上的实验结果证明了MWCVS的有效性,展示了视频记忆性预测的前景应用。
URL
https://arxiv.org/abs/2506.08649