Abstract
Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with explicit character name references, posing unique challenges in movie this http URL identify active main characters and focus on storyline-relevant regions, we propose FocusedAD, a novel framework that delivers character-centric movie audio descriptions. It includes: (i) a Character Perception Module(CPM) for tracking character regions and linking them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused Caption Module(FCM) that generates narrations enriched with plot-relevant details and named characters. To overcome limitations in character identification, we also introduce an automated pipeline for building character query banks. FocusedAD achieves state-of-the-art performance on multiple benchmarks, including strong zero-shot results on MAD-eval-Named and our newly proposed Cinepile-AD dataset. Code and data will be released at this https URL .
Abstract (translated)
电影音频描述(AD)旨在通过叙述视觉内容来帮助盲人和视力受损者(BVI)在无对话的片段中更好地理解画面。相比一般的视频字幕,AD需要提供与剧情相关且明确指明角色名称的叙述,这为电影带来了独特的挑战。 为了识别活跃的主要角色并专注于与剧情相关的区域,我们提出了FocusedAD这一创新框架,它提供了以人物为中心的电影音频描述。该框架包括以下部分: (i) 角色感知模块(CPM),用于跟踪角色所在的画面区域,并将其链接到对应的名字; (ii) 动态先验模块(DPM),通过可学习的软提示从之前的AD和字幕中注入上下文线索; (iii) 集中描述模块(FCM),生成包含与剧情相关的细节及命名人物的叙述。 为了克服角色识别上的局限性,我们还引入了一种自动化流程来构建角色查询库。 在多个基准测试上,包括在MAD-eval-Named和新提出的Cinepile-AD数据集中的零样本结果中,FocusedAD达到了最先进的性能水平。代码和数据将在该链接(假设提供了一个URL)发布。
URL
https://arxiv.org/abs/2504.12157