Abstract
Media Storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their significance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We demonstrate the applicability of this method in two scenarios: first, supplementing an initial list of media storms within a specific time frame; and second, detecting media storms in new time periods. We make available a media storm dataset compiled using both scenarios. Both the method and dataset offer the basis for comprehensive empirical research into the concept of media storms, including characterizing them and predicting their outbursts and durations, in mainstream media or social media platforms.
Abstract (translated)
媒体风暴,即对故事关注的高涨,是媒体动态和关注力格局的核心组成部分。尽管这一概念具有重要意义,但由於测量和操作问题,系统性和实证研究仍然很少。我们引入了一种迭代的人-反馈方法,以在大规模新闻文章语料库中识别媒体风暴。首先将文本转换为基于多个文本特征的扩散信号。在每一次迭代中,我们对这些信号应用无监督异常检测;然后由专家验证每个异常的存在,并使用这些结果对下一次迭代中的异常检测进行调整。我们展示了这种方法的两种应用场景:一是补充特定时间段内的媒体风暴列表;二是在新时间段内检测媒体风暴。我们还提供了这两种场景下的媒体风暴数据集。这种方法和数据集为全面实证研究媒体风暴的概念提供了基础,包括对其进行描述和预测爆发时间和持续时间,以及在主流媒体或社交媒体平台上。
URL
https://arxiv.org/abs/2404.09299