Efficient Meshy Neural Fields for Animatable Human Avatars

Abstract
Abstract (translated)
URL
PDF

Abstract

Efficiently digitizing high-fidelity animatable human avatars from videos is a challenging and active research topic. Recent volume rendering-based neural representations open a new way for human digitization with their friendly usability and photo-realistic reconstruction quality. However, they are inefficient for long optimization times and slow inference speed; their implicit nature results in entangled geometry, materials, and dynamics of humans, which are hard to edit afterward. Such drawbacks prevent their direct applicability to downstream applications, especially the prominent rasterization-based graphic ones. We present EMA, a method that Efficiently learns Meshy neural fields to reconstruct animatable human Avatars. It jointly optimizes explicit triangular canonical mesh, spatial-varying material, and motion dynamics, via inverse rendering in an end-to-end fashion. Each above component is derived from separate neural fields, relaxing the requirement of a template, or rigging. The mesh representation is highly compatible with the efficient rasterization-based renderer, thus our method only takes about an hour of training and can render in real-time. Moreover, only minutes of optimization is enough for plausible reconstruction results. The disentanglement of meshes enables direct downstream applications. Extensive experiments illustrate the very competitive performance and significant speed boost against previous methods. We also showcase applications including novel pose synthesis, material editing, and relighting. The project page: this https URL.

Abstract (translated)

Efficiently digitizing high-fidelity animatable humanAvatars from videos是一个具有挑战性和活力的研究话题。最近的体积渲染基于神经网络表示法开创了一种友好的易用性和逼真重建质量的新途径，使得人类数字输入变得容易实现。然而，这种表示方法在长时间的优化时间和慢速推断速度下效率较低；其隐含的特性导致了人类几何、材料和动力学的纠缠，这使得后续编辑非常困难。这些缺点限制了它们直接适用于下游应用，特别是基于显式渲染的图形应用。我们提出了EMA方法，该方法通过反向渲染有效地学习Meshy神经网络区域以重构可模拟人类Avatar。它通过共同优化明确三角形的Mesh、空间可变材料以及运动动力学，通过端到端的方式实现了高效的逆渲染。每个上述组件都从独立的神经网络区域中推导出来，放松了模板或rigging的要求。Mesh表示与高效的显式渲染器高度兼容，因此我们的方法只需要大约一个小时的培训和就能实时渲染。此外，仅仅优化几分钟就可以生成合理的重建结果。Mesh的解耦使直接适用于下游应用成为可能。广泛的实验展示了非常竞争力的性能和对前方法的显著速度提升。我们还展示了包括新颖姿势合成、材料编辑和重新照明的应用。项目页面：这个https URL。

URL

https://arxiv.org/abs/2303.12965

PDF

https://arxiv.org/pdf/2303.12965.pdf