Abstract
The dynamic nature of esports makes the situation relatively complicated for average viewers. Esports broadcasting involves game expert casters, but the caster-dependent game commentary is not enough to fully understand the game situation. It will be richer by including diverse multimodal esports information, including audiences' talks/emotions, game audio, and game match event information. This paper introduces GAME-MUG, a new multimodal game situation understanding and audience-engaged commentary generation dataset and its strong baseline. Our dataset is collected from 2020-2022 LOL game live streams from YouTube and Twitch, and includes multimodal esports game information, including text, audio, and time-series event logs, for detecting the game situation. In addition, we also propose a new audience conversation augmented commentary dataset by covering the game situation and audience conversation understanding, and introducing a robust joint multimodal dual learning model as a baseline. We examine the model's game situation/event understanding ability and commentary generation capability to show the effectiveness of the multimodal aspects coverage and the joint integration learning approach.
Abstract (translated)
电子竞技(esports)的动态性质使得普通观众的情况变得复杂。电子竞技直播涉及游戏专家主持者,但仅依赖主持者的游戏解说无法完全理解游戏情况。通过包括多样化的多模态电竞信息,包括观众的谈话/情感、游戏音频和游戏比赛事件信息,可以让数据更加丰富。本文介绍了一个新的多模态游戏情况理解和观众参与评论生成数据集GAME-MUG及其强基线。我们的数据来自2020-2022年从YouTube和Twitch上收集的《英雄联盟》(LOL)游戏直播,包括多模态电竞游戏信息,包括文本、音频和时间序列事件日志,以检测游戏情况。此外,我们还提出了一个新的观众对话增强评论数据集,涵盖了游戏情况和观众对话理解,并引入了一个稳健的联合多模态双学习作为基线。我们研究了模型的游戏情况/事件理解能力和评论生成能力,以展示多模态方面的覆盖和联合集成学习方法的有效性。
URL
https://arxiv.org/abs/2404.19175