Abstract
Real-world videos often suffer from complex degradations, such as noise, compression artifacts, and low-light distortions, due to diverse acquisition and transmission conditions. Existing restoration methods typically require professional manual selection of specialized models or rely on monolithic architectures that fail to generalize across varying degradations. Inspired by expert experience, we propose MoA-VR, the first \underline{M}ixture-\underline{o}f-\underline{A}gents \underline{V}ideo \underline{R}estoration system that mimics the reasoning and processing procedures of human professionals through three coordinated agents: Degradation Identification, Routing and Restoration, and Restoration Quality Assessment. Specifically, we construct a large-scale and high-resolution video degradation recognition benchmark and build a vision-language model (VLM) driven degradation identifier. We further introduce a self-adaptive router powered by large language models (LLMs), which autonomously learns effective restoration strategies by observing tool usage patterns. To assess intermediate and final processed video quality, we construct the \underline{Res}tored \underline{V}ideo \underline{Q}uality (Res-VQ) dataset and design a dedicated VLM-based video quality assessment (VQA) model tailored for restoration tasks. Extensive experiments demonstrate that MoA-VR effectively handles diverse and compound degradations, consistently outperforming existing baselines in terms of both objective metrics and perceptual quality. These results highlight the potential of integrating multimodal intelligence and modular reasoning in general-purpose video restoration systems.
Abstract (translated)
现实中的视频通常会遭受各种复杂的退化问题,例如噪声、压缩伪影和低光失真,这是由于采集和传输条件的多样性所导致。现有的恢复方法一般需要专业的手动选择专门模型或依赖于单一架构,而这种架构无法在不同退化的条件下泛化。受专家经验启发,我们提出了MoA-VR(混合体代理视频恢复系统),它是第一个通过三个协调的代理模仿人类专业人员推理和处理过程的方法:退化识别、路由与恢复以及恢复质量评估。 具体来说,我们构建了一个大规模且高分辨率的视频降级识别基准,并建立了一个由视觉语言模型(VLM)驱动的降级标识器。此外,我们还引入了一种自我适应路由器,它通过观察工具使用模式来自主学习有效的恢复策略,这种路由器是由大型语言模型(LLM)驱动的。为了评估中间和最终处理视频的质量,我们构建了修复视频质量(Res-VQ)数据集,并设计了一个针对恢复任务定制的基于视觉语言模型的视频质量评估(VQA)模型。 广泛的实验表明,MoA-VR能够有效应对各种复杂退化问题,在客观指标和感知质量方面均优于现有的基准系统。这些结果突显了在通用视频恢复系统中结合多模态智能和模块化推理的巨大潜力。
URL
https://arxiv.org/abs/2510.08508