Abstract
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at this https URL.
Abstract (translated)
自动实时视频评论因其在叙述生成、主题解释等领域的 significance而受到越来越多的关注。然而,目前的 methods 中缺少情感考虑。情感因素在交互式评论中至关重要,目前还没有相关研究。因此,在本文中,我们提出了一个基于情感的 Transformer-based Variational Autoencoder (So-TVAE) 网络,由情感导向的多样性编码模块和批量注意模块组成,以实现具有多种情感和多种语义的视频评论。具体来说,我们的情感导向多样性编码器巧妙地将 VAE 和随机掩码机制结合起来,在情感引导下实现语义多样性,然后与跨模态特征融合以生成实时视频评论。此外,本文还提出了一种批量注意模块,以减轻由于数据不平衡引起的问题,即在视频受欢迎程度不同的情况下,情感样本的缺失问题。在 Livebot 和 VideoIC 数据集上进行的大量实验证明,与最先进的methods 相比,所提出的 So-TVAE 在评论质量和多样性方面都表现出色。相关代码可在此处访问:https://url.
URL
https://arxiv.org/abs/2404.12782