Equivariant Multi-View Networks

2019-04-01 17:58:17

Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis

arXiv_CV

arXiv_CV CNN Classification Pose Action 3D Scene_Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Several approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.

Abstract (translated)

三维视觉任务的几种方法独立地处理输入的多个视图，并在自然图像上预先训练深神经网络，通过对所有视图进行单轮汇集，实现视图排列不变性。我们认为，这个操作丢弃了重要的信息，导致了子ar全局描述符。在本文中，我们提出了一种群卷积的方法来进行多视图聚合，其中卷积是在旋转组的离散子群上进行的，因此能够以等变（而不是不变）的方式对所有视图进行联合推理，直到最后一层。我们进一步发展了这个概念，在旋转群的更小的离散齐次空间上操作，在这里使用极视图表示来保持只有输入视图数的一小部分的等方差。我们在一些大型三维形状检索任务中设置了新的技术状态，并展示了全景场景分类的其他应用。

URL

https://arxiv.org/abs/1904.00993

PDF

https://arxiv.org/pdf/1904.00993.pdf