Abstract
The scale and quality of point cloud datasets constrain the advancement of point cloud learning. Recently, with the development of multi-modal learning, the incorporation of domain-agnostic prior knowledge from other modalities, such as images and text, to assist in point cloud feature learning has been considered a promising avenue. Existing methods have demonstrated the effectiveness of multi-modal contrastive training and feature distillation on point clouds. However, challenges remain, including the requirement for paired triplet data, redundancy and ambiguity in supervised features, and the disruption of the original priors. In this paper, we propose a language-assisted approach to point cloud feature learning (LAST-PCL), enriching semantic concepts through LLMs-based text enrichment. We achieve de-redundancy and feature dimensionality reduction without compromising textual priors by statistical-based and training-free significant feature selection. Furthermore, we also delve into an in-depth analysis of the impact of text contrastive training on the point cloud. Extensive experiments validate that the proposed method learns semantically meaningful point cloud features and achieves state-of-the-art or comparable performance in 3D semantic segmentation, 3D object detection, and 3D scene classification tasks. The source code is available at this https URL.
Abstract (translated)
点云数据集的规模和质量限制了点云学习的进步。最近,随着多模态学习的发展,将其他模态(如图像和文本)的领域无关先验知识引入到点云特征学习以辅助点云特征学习被认为是一个有前途的途径。现有的方法已经证明了多模态对比训练和特征蒸馏在点云中的有效性。然而,仍然存在一些挑战,包括需要成对的三元组数据、监督特征的冗余和模糊以及原始先验知识的破坏。在本文中,我们提出了一种语言辅助的点云特征学习方法(LAST-PCL),通过LLM-based文本丰富来丰富语义概念。我们通过基于统计的基于训练的方法显著特征选择实现了去冗余和特征维度减少,同时不牺牲文本先验知识。此外,我们还深入研究了文本对比训练对点云的影响。大量实验证实,所提出的方法可以学习到语义上有意义的点云特征,并在3D语义分割、3D目标检测和3D场景分类任务中实现与最先进水平相当或更好的性能。源代码可在此处下载:https://www.acm.org/dl/doi/10.1145/2848206.2848313
URL
https://arxiv.org/abs/2312.11451