MagicPony: Learning Articulated 3D Animals in the Wild

2022-11-22 18:59:31

Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

arXiv_CV

arXiv_CV Knowledge Quantitative Transformer Pose 3D Self-Supervised

Abstract
Abstract (translated)
URL
PDF

Abstract

We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.

Abstract (translated)

URL

https://arxiv.org/abs/2211.12497

PDF

https://arxiv.org/pdf/2211.12497.pdf