Abstract
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.
Abstract (translated)
自监督学习在点云理解中引起了广泛关注。然而,探索具有区分性和可转移性的特征仍然由于它们的不规则性和稀疏性的性质而非常困难。我们提出了一种基于几何和自适应掩膜的自编码器,称为 \textit{PointGame}。PointGame包含两个核心组件:Gate和EAT。Gate表示基于几何特征的自适应 token 嵌入模块,不仅吸收了传统的几何描述符,有效地捕捉表面形状的经验,而且还利用自适应偏差来集中关注点云的突出部分。EAT表示基于外部注意力的Transformer编码器,具有线性计算复杂性,提高了整个流程的效率。与最先进的无监督学习模型不同,PointGame利用几何描述符来感知表面形状,并自适应地从训练数据中挖掘具有区分性的特征。PointGame在不同全球和 local微调策略下的多个后续任务中展现了明显的优势,代码和预训练模型将公开可用。
URL
https://arxiv.org/abs/2303.13100