Paper Reading AI Learner

Expressive Whole-Body 3D Gaussian Avatar

2024-07-31 15:29:13
Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito

Abstract

Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand this http URL this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.

Abstract (translated)

面部表情和手部动作是我们表达情感和与外界互动的必要手段。然而,从随意捕捉的视频中提取的3D人类头像模型大部分仅支持身体动作,而没有面部表情和手部动作。针对这个问题,我们提出了ExAvatar,一种从短视视频中学到的富有表现力的全身3D人类头像。我们将ExAvatar设计为整身体积参数网格模型(SMPL-X)和3D高斯展平(3DGS)的结合。主要挑战是1)视频中面部表情和姿势的多样性有限,2)缺乏3D观察,如3D扫描和RGBD图像。视频中的有限多样性使得具有新颖面部表情和姿势的动画变得非寻常困难。此外,缺乏3D观察可能导致在不观察到的部位出现显著的模糊,从而在新型动作下产生明显的伪影。为了应对这些问题,我们引入了我们的网格和3D高斯的中值表示。我们的中值表示将每个3D高斯视为表面上的顶点,并具有预定义的连接信息(即三角形面)连接它们,遵循SMPL-X的网格拓扑结构。它使得我们的ExAvatar通过SMPL-X面部表情空间驱动具有新颖面部表情。此外,通过使用基于连通性的正则化方法,我们显著减少了新型面部表情和姿势中的伪影。

URL

https://arxiv.org/abs/2407.21686

PDF

https://arxiv.org/pdf/2407.21686.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot