Paper Reading AI Learner

GTNet: Graph Transformer Network for 3D Point Cloud Classification and Semantic Segmentation

2023-05-24 14:51:18
Wei Zhou, Qian Wang, Weiwei Jin, Xinzhe Shi, Dekui Wang, Xingxing Hao, Yongxiang Yu

Abstract

Recently, graph-based and Transformer-based deep learning networks have demonstrated excellent performances on various point cloud tasks. Most of the existing graph methods are based on static graph, which take a fixed input to establish graph relations. Moreover, many graph methods apply maximization and averaging to aggregate neighboring features, so that only a single neighboring point affects the feature of centroid or different neighboring points have the same influence on the centroid's feature, which ignoring the correlation and difference between points. Most Transformer-based methods extract point cloud features based on global attention and lack the feature learning on local neighbors. To solve the problems of these two types of models, we propose a new feature extraction block named Graph Transformer and construct a 3D point point cloud learning network called GTNet to learn features of point clouds on local and global patterns. Graph Transformer integrates the advantages of graph-based and Transformer-based methods, and consists of Local Transformer and Global Transformer modules. Local Transformer uses a dynamic graph to calculate all neighboring point weights by intra-domain cross-attention with dynamically updated graph relations, so that every neighboring point could affect the features of centroid with different weights; Global Transformer enlarges the receptive field of Local Transformer by a global self-attention. In addition, to avoid the disappearance of the gradient caused by the increasing depth of network, we conduct residual connection for centroid features in GTNet; we also adopt the features of centroid and neighbors to generate the local geometric descriptors in Local Transformer to strengthen the local information learning capability of the model. Finally, we use GTNet for shape classification, part segmentation and semantic segmentation tasks in this paper.

Abstract (translated)

近年来,基于图和Transformer的深度学习网络在各种点云任务中表现出卓越的性能。大多数现有的图方法都基于静态图,通过给定一个固定输入来建立图关系。此外,许多图方法应用最大和平均收敛来聚合相邻特征,因此只有单个相邻点会影响中心点的特征或不同相邻点对中心点的特征具有相同的影响,而忽视了点之间的相关和差异。大多数Transformer方法基于全球注意力来提取点云特征,并且缺乏对本地邻居特征的学习。为了解决这两种模型的问题,我们提出了名为Graph Transformer的新特征提取块,并构建了一个名为GTNet的三维点云点云学习网络,以学习点云的本地和全局特征。Graph Transformer集成了基于图方法和Transformer方法的优点,包括本地Transformer和全球Transformer模块。本地Transformer使用动态图计算所有相邻点权重,通过域内交叉注意力动态更新图关系,因此每个相邻点可以影响中心点的不同权重;全球Transformer通过全球自注意力扩大本地Transformer的响应域。此外,为了避免网络深度增加引起的梯度消失问题,我们在GTNet中进行了残留连接,同时采用中心点和邻居的特征生成本地几何 descriptor,以加强模型的本地信息学习能力。最后,在本文中,我们使用GTNet进行形状分类、部分分割和语义分割任务。

URL

https://arxiv.org/abs/2305.15213

PDF

https://arxiv.org/pdf/2305.15213.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot