Abstract
Recently, graph-based and Transformer-based deep learning networks have demonstrated excellent performances on various point cloud tasks. Most of the existing graph methods are based on static graph, which take a fixed input to establish graph relations. Moreover, many graph methods apply maximization and averaging to aggregate neighboring features, so that only a single neighboring point affects the feature of centroid or different neighboring points have the same influence on the centroid's feature, which ignoring the correlation and difference between points. Most Transformer-based methods extract point cloud features based on global attention and lack the feature learning on local neighbors. To solve the problems of these two types of models, we propose a new feature extraction block named Graph Transformer and construct a 3D point point cloud learning network called GTNet to learn features of point clouds on local and global patterns. Graph Transformer integrates the advantages of graph-based and Transformer-based methods, and consists of Local Transformer and Global Transformer modules. Local Transformer uses a dynamic graph to calculate all neighboring point weights by intra-domain cross-attention with dynamically updated graph relations, so that every neighboring point could affect the features of centroid with different weights; Global Transformer enlarges the receptive field of Local Transformer by a global self-attention. In addition, to avoid the disappearance of the gradient caused by the increasing depth of network, we conduct residual connection for centroid features in GTNet; we also adopt the features of centroid and neighbors to generate the local geometric descriptors in Local Transformer to strengthen the local information learning capability of the model. Finally, we use GTNet for shape classification, part segmentation and semantic segmentation tasks in this paper.
Abstract (translated)
近年来,基于图和Transformer的深度学习网络在各种点云任务中表现出卓越的性能。大多数现有的图方法都基于静态图,通过给定一个固定输入来建立图关系。此外,许多图方法应用最大和平均收敛来聚合相邻特征,因此只有单个相邻点会影响中心点的特征或不同相邻点对中心点的特征具有相同的影响,而忽视了点之间的相关和差异。大多数Transformer方法基于全球注意力来提取点云特征,并且缺乏对本地邻居特征的学习。为了解决这两种模型的问题,我们提出了名为Graph Transformer的新特征提取块,并构建了一个名为GTNet的三维点云点云学习网络,以学习点云的本地和全局特征。Graph Transformer集成了基于图方法和Transformer方法的优点,包括本地Transformer和全球Transformer模块。本地Transformer使用动态图计算所有相邻点权重,通过域内交叉注意力动态更新图关系,因此每个相邻点可以影响中心点的不同权重;全球Transformer通过全球自注意力扩大本地Transformer的响应域。此外,为了避免网络深度增加引起的梯度消失问题,我们在GTNet中进行了残留连接,同时采用中心点和邻居的特征生成本地几何 descriptor,以加强模型的本地信息学习能力。最后,在本文中,我们使用GTNet进行形状分类、部分分割和语义分割任务。
URL
https://arxiv.org/abs/2305.15213