Abstract
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new content but also edit the generated material. This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques. We summarize milestone studies and critically analyze their innovations and shortcomings within each domain. Additionally, we organize an extensive collection of datasets and provide a thorough performance analysis of current methodologies based on various evaluation metrics, aiming to furnish a clear framework and robust data support for future research. Finally, we explore application scenarios of talking head synthesis, illustrate them with specific cases, and examine potential future directions.
Abstract (translated)
谈话头合成是一种先进的方法,用于从特定内容驱动的静态图像中生成视频肖像。它在虚拟现实、增强现实和游戏制作领域引起了广泛关注。近年来,随着新模型的引入,如Transformer和扩散模型,取得了显著的突破。现有的方法不仅能生成新的内容,还可以编辑生成材料。本调查系统地回顾了该技术,将其分为三个关键领域:肖像生成、驱动机制和编辑技术。我们总结了里程碑式的研究,并对其创新和不足进行了批判性分析。此外,我们组织了一组丰富的数据集,对各种评估指标进行了详细的表现分析,旨在为未来的研究提供一个清晰、可靠的框架和数据支持。最后,我们探讨了谈话头合成应用场景,用具体案例进行了说明,并探讨了未来的研究方向。
URL
https://arxiv.org/abs/2406.10553