Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation

Abstract
Abstract (translated)
URL
PDF

Abstract

In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively. Code is available at this https URL.

Abstract (translated)

在本作品中，我们面临了多视角和单视角3D点云语义分割的挑战任务。在2D计算机视觉中，多视角语义分割的成功主要依赖于像imagenet这样的大规模数据集的预训练。对大规模2D数据集的特征提取器进行预训练极大地帮助了2D单视角学习。然而，3D深度学习的发展受到数据集体积和实例模式的有限限制，因为收集和标注3D数据的成本非常高。这导致了3D点云多视角分割中缺乏代表性特征和群体特征变化较大。因此，直接在3D点云分割中扩展现有的单视角的典型方法不会像在2D领域那样有效。为了解决这一问题，我们提出了一个 Query-Guided Prototype Adaption (QGPA)模块，以从支持点云特征空间到查询点云特征空间的原型适应。通过这种原型适应，我们极大地缓解了点云群体特征内部差异的问题，并显著提高了单视角3D分割的性能。此外，为了增强原型表示，我们引入了一个Self-Reconstruction (SR)模块，使原型能够尽可能重建支持Mask。此外，我们还考虑了在没有支持样本的情况下进行单视角3D点云语义分割。为此，我们引入了类别词作为语义信息，并提出了语义-视觉投射模型，以连接语义和视觉空间。我们提出的方法在S3DIS和ScanNet基准测试中分别超越了现有算法的7.90%和14.82%。代码可在这个httpsURL上获取。

URL

https://arxiv.org/abs/2305.14335

PDF

https://arxiv.org/pdf/2305.14335.pdf