Abstract
Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems can recreate the dynamics of shared viewing experiences across diverse content types. We present CompanionCast, a general framework for orchestrating multiple role-specialized AI agents that respond to video content using multimodal inputs, speech synthesis, and spatial audio. Distinctly, CompanionCast integrates an LLM-as-a-Judge module that iteratively scores and refines conversations across five dimensions (relevance, authenticity, engagement, diversity, personality consistency). We validate this framework through sports viewing, a domain with rich dynamics and strong social traditions, where a pilot study with soccer fans suggests that multi-agent interaction improves perceived social presence compared to solo viewing. We contribute: (1) a generalizable framework for orchestrating multi-agent conversations around multimodal video content, (2) a novel evaluator-agent pipeline for conversation quality control, and (3) exploratory evidence of increased social presence in AI-mediated co-viewing. We discuss challenges and future directions for applying this approach to diverse viewing contexts including entertainment, education, and collaborative watching experiences.
Abstract (translated)
社会存在感是共同观看内容时享受体验的核心,然而现代媒体消费越来越倾向于个人化。我们研究了多代理对话人工智能系统是否能够重现不同类型内容的共享观看经历动态效果。我们介绍了CompanionCast,这是一个通过使用多媒体输入、语音合成和空间音频来响应视频内容的多个角色专业化的AI代理的通用框架。 独特的是,CompanionCast整合了一个LLM(大语言模型)作为裁判模块,该模块会迭代地评估并改进对话在五个维度上的表现(相关性、真实性、参与度、多样性以及个性一致性)。我们通过体育观看验证了这个框架的有效性——这是一个拥有丰富动态特性和强大社会传统的领域。一项针对足球迷的小规模研究表明,在与多代理系统互动时,人们感知到的社会存在感比独自观看要高。 我们的贡献包括: 1. 一种围绕多媒体视频内容组织多代理对话的通用化框架; 2. 对话质量控制的新颖评估者-代理管道; 3. 探索性证据表明AI中介的共观体验能够增强社会存在感。 我们讨论了将这种方法应用于娱乐、教育和协作观看等多样化场景中的挑战以及未来的发展方向。
URL
https://arxiv.org/abs/2512.10918