Paper Reading AI Learner

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

2023-03-23 08:32:10
Yun Liu, Xuefeng Yan, Zhilei Chen, Zhiqi Li, Zeyong Wei, Mingqiang Wei

Abstract

Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.

Abstract (translated)

自监督学习在点云理解中引起了广泛关注。然而,探索具有区分性和可转移性的特征仍然由于它们的不规则性和稀疏性的性质而非常困难。我们提出了一种基于几何和自适应掩膜的自编码器,称为 \textit{PointGame}。PointGame包含两个核心组件:Gate和EAT。Gate表示基于几何特征的自适应 token 嵌入模块,不仅吸收了传统的几何描述符,有效地捕捉表面形状的经验,而且还利用自适应偏差来集中关注点云的突出部分。EAT表示基于外部注意力的Transformer编码器,具有线性计算复杂性,提高了整个流程的效率。与最先进的无监督学习模型不同,PointGame利用几何描述符来感知表面形状,并自适应地从训练数据中挖掘具有区分性的特征。PointGame在不同全球和 local微调策略下的多个后续任务中展现了明显的优势,代码和预训练模型将公开可用。

URL

https://arxiv.org/abs/2303.13100

PDF

https://arxiv.org/pdf/2303.13100.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot