Paper Reading AI Learner

Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance

2025-01-09 17:04:33
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stefanos Zafeiriou
       

Abstract

Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing detailed 3D scenes within multi-view setups and the emergence of large 2D human foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input. To achieve that, we extend such a model for diverse-view human head generation by fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation. This is achieved through a modified 3DGS approach, connectivity regularizers, and a strategic initialization tailored for our task. Additionally, we propose an optional efficient SDS-based correction step to refine the blendshape expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar achieves state-of-the-art realism and identity preservation, effectively addressing color issues by allowing the use of very low guidance, enabled by our strong identity prior and initialization strategy, without compromising detail.

Abstract (translated)

受3D高斯点阵(3DGS)在多视角设置中重建详细3D场景的高效性和大型2D人体基础模型出现的启发,我们推出了Arc2Avatar,这是第一个基于形状分段合成(SDS)的方法,它利用单个人脸图像作为输入的人体面部基础模型进行引导。为了实现这一目标,我们在人工合成数据上对这样的模型进行了微调,并对其条件进行了修改,以适应多视角人体头部生成的需求。我们的虚拟角色与一个人类脸部网格模板保持密集对应关系,从而可以基于混合形状(blendshape)生成表情。这通过改进的3DGS方法、连通性正则化器以及专为任务定制的战略初始化来实现。 此外,我们提出了一种可选的高效SDS基线修正步骤,以细化混合形状表达,增强真实感和多样性。实验表明,Arc2Avatar在现实性和身份保持方面达到了最先进的水平,并且通过允许使用非常低的指导(由我们的强大身份先验和初始化策略启用),有效地解决了色彩问题,同时不牺牲细节。

URL

https://arxiv.org/abs/2501.05379

PDF

https://arxiv.org/pdf/2501.05379.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot