Abstract
Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs) inspired by the success of generalist models, such as large language and vision models, have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding. (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content. (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis. (4) Human-centric Agentic Foundation Models that extend beyond perception and generation to learn human-like intelligence and interactive behaviors for humanoid embodied tasks. We review state-of-the-art techniques, discuss emerging challenges and future research directions. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling.
Abstract (translated)
人类的理解和生成对于数字人及仿人模型的构建至关重要。最近,受大型语言和视觉模型等通用模型成功的启发,以人类为中心的基础模型(HcFMs)兴起并致力于将各种以人为中心的任务整合到一个统一框架中,从而超越了传统的特定任务方法。在这篇综述中,我们提出了一种分类法,通过将其当前的方法分为四个类别来全面概述HcFMs:(1) 以人类为中心的感知基础模型,捕捉多模态2D和3D理解中的细微特征;(2) 以人为中心的人工智能生成(AIGC)基础模型,能够生成高保真度、多样化的人类相关内容;(3) 统一感知与生成模型,整合这些能力以增强人类理解和合成;以及 (4) 以人类为中心的代理基础模型,超越感知和生成,学习类似人的智慧及用于仿人任务中的交互行为。我们回顾了最新的技术,并讨论了新兴挑战和未来的研究方向。该综述旨在为致力于更稳健、多样化且智能的数字人和仿生体建模的研究人员和实践者提供路线图。
URL
https://arxiv.org/abs/2502.08556