Abstract
Self-supervised learning is attracting wide attention in point cloud processing. However, it is still not well-solved to gain discriminative and transferable features of point clouds for efficient training on downstream tasks, due to their natural sparsity and irregularity. We propose PointSmile, a reconstruction-free self-supervised learning paradigm by maximizing curriculum mutual information (CMI) across the replicas of point cloud objects. From the perspective of how-and-what-to-learn, PointSmile is designed to imitate human curriculum learning, i.e., starting with an easy curriculum and gradually increasing the difficulty of that curriculum. To solve "how-to-learn", we introduce curriculum data augmentation (CDA) of point clouds. CDA encourages PointSmile to learn from easy samples to hard ones, such that the latent space can be dynamically affected to create better embeddings. To solve "what-to-learn", we propose to maximize both feature- and class-wise CMI, for better extracting discriminative features of point clouds. Unlike most of existing methods, PointSmile does not require a pretext task, nor does it require cross-modal data to yield rich latent representations. We demonstrate the effectiveness and robustness of PointSmile in downstream tasks including object classification and segmentation. Extensive results show that our PointSmile outperforms existing self-supervised methods, and compares favorably with popular fully-supervised methods on various standard architectures.
Abstract (translated)
点云处理中 self-supervised learning 受到广泛关注。然而,要获得点云的有用和可移植的特征,以在后续任务中进行高效的培训,由于它们的自然稀疏和不规则性,尚未得到充分解决。我们提出了 PointSmile,一种通过最大化点云对象之间的课程互信息(CMI)来无重构地学习范式,以模仿人类课程学习。从学习如何和什么开始,PointSmile 旨在模仿人类课程学习,即从容易的课程开始,逐渐提高其难度。为了解决“如何学习”,我们引入了点云课程增强(CDA)。CDA 鼓励 PointSmile 从容易样本到困难样本学习,从而使潜在空间能够动态地影响,以创建更好的嵌入。为了解决“什么学习”,我们提出了最大化特征和类之间的CMI,以更好地提取点云的有用特征。与大多数现有方法不同,PointSmile 不需要一个借口任务,也不要求跨modal数据以产生丰富的潜在表示。我们证明了 PointSmile 在后续任务,包括对象分类和分割中的有效性和鲁棒性。广泛的结果表明,我们的 PointSmile 优于现有的自监督方法,并与其他流行的完全监督方法在各种标准架构上进行比较。
URL
https://arxiv.org/abs/2301.12744