Paper Reading AI Learner

Efficient Meshy Neural Fields for Animatable Human Avatars

2023-03-23 00:15:34
Xiaoke Huang, Yiji Cheng, Yansong Tang, Xiu Li, Jie Zhou, Jiwen Lu

Abstract

Efficiently digitizing high-fidelity animatable human avatars from videos is a challenging and active research topic. Recent volume rendering-based neural representations open a new way for human digitization with their friendly usability and photo-realistic reconstruction quality. However, they are inefficient for long optimization times and slow inference speed; their implicit nature results in entangled geometry, materials, and dynamics of humans, which are hard to edit afterward. Such drawbacks prevent their direct applicability to downstream applications, especially the prominent rasterization-based graphic ones. We present EMA, a method that Efficiently learns Meshy neural fields to reconstruct animatable human Avatars. It jointly optimizes explicit triangular canonical mesh, spatial-varying material, and motion dynamics, via inverse rendering in an end-to-end fashion. Each above component is derived from separate neural fields, relaxing the requirement of a template, or rigging. The mesh representation is highly compatible with the efficient rasterization-based renderer, thus our method only takes about an hour of training and can render in real-time. Moreover, only minutes of optimization is enough for plausible reconstruction results. The disentanglement of meshes enables direct downstream applications. Extensive experiments illustrate the very competitive performance and significant speed boost against previous methods. We also showcase applications including novel pose synthesis, material editing, and relighting. The project page: this https URL.

Abstract (translated)

Efficiently digitizing high-fidelity animatable humanAvatars from videos是一个具有挑战性和活力的研究话题。最近的体积渲染基于神经网络表示法开创了一种友好的易用性和逼真重建质量的新途径,使得人类数字输入变得容易实现。然而,这种表示方法在长时间的优化时间和慢速推断速度下效率较低;其隐含的特性导致了人类几何、材料和动力学的纠缠,这使得后续编辑非常困难。这些缺点限制了它们直接适用于下游应用,特别是基于显式渲染的图形应用。我们提出了EMA方法,该方法通过反向渲染有效地学习Meshy神经网络区域以重构可模拟人类Avatar。它通过共同优化明确三角形的Mesh、空间可变材料以及运动动力学,通过端到端的方式实现了高效的逆渲染。每个上述组件都从独立的神经网络区域中推导出来,放松了模板或rigging的要求。Mesh表示与高效的显式渲染器高度兼容,因此我们的方法只需要大约一个小时的培训和就能实时渲染。此外,仅仅优化几分钟就可以生成合理的重建结果。Mesh的解耦使直接适用于下游应用成为可能。广泛的实验展示了非常竞争力的性能和对前方法的显著速度提升。我们还展示了包括新颖姿势合成、材料编辑和重新照明的应用。项目页面:这个https URL。

URL

https://arxiv.org/abs/2303.12965

PDF

https://arxiv.org/pdf/2303.12965.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot