Abstract
Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a "soft" continuous subset in training phase, and a "hard" discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset.
Abstract (translated)
由于3D传感器的普及,几何深度学习变得越来越重要。受NLP领域最新进展的启发,引入了自注意变换器来消耗点云。我们开发了点注意变换器(PATS),使用一个参数有效的群洗牌注意(GSA)来取代昂贵的多头注意。我们证明了它处理大小变化输入的能力,并证明了它的置换等方差。此外,先前的工作使用对输入数据的启发式依赖(例如,最远点采样)来分层选择输入点的子集。因此,我们首次提出了一种端到端可学习和任务无关的采样操作,称为Gumbel子集采样(GSS),以选择一个具有代表性的输入点子集。它配备了Gumbel Softmax,在训练阶段生成一个“软”连续子集,在测试阶段生成一个“硬”离散子集。通过以层次结构的方式选择具有代表性的子集,网络可以以较低的计算成本获得更强大的输入集表示能力。分类和分割基准的实验表明了我们的方法的有效性和效率。此外,我们还提出了一个新的应用程序,将事件摄像机流作为点云进行处理,并在DVS128手势数据集上实现了最先进的性能。
URL
https://arxiv.org/abs/1904.03375