Paper Reading AI Learner

PointSmile: Point Self-supervised Learning via Curriculum Mutual Information

2023-01-30 09:18:54
Xin Li, Mingqiang Wei, Songcan Chen

Abstract

Self-supervised learning is attracting wide attention in point cloud processing. However, it is still not well-solved to gain discriminative and transferable features of point clouds for efficient training on downstream tasks, due to their natural sparsity and irregularity. We propose PointSmile, a reconstruction-free self-supervised learning paradigm by maximizing curriculum mutual information (CMI) across the replicas of point cloud objects. From the perspective of how-and-what-to-learn, PointSmile is designed to imitate human curriculum learning, i.e., starting with an easy curriculum and gradually increasing the difficulty of that curriculum. To solve "how-to-learn", we introduce curriculum data augmentation (CDA) of point clouds. CDA encourages PointSmile to learn from easy samples to hard ones, such that the latent space can be dynamically affected to create better embeddings. To solve "what-to-learn", we propose to maximize both feature- and class-wise CMI, for better extracting discriminative features of point clouds. Unlike most of existing methods, PointSmile does not require a pretext task, nor does it require cross-modal data to yield rich latent representations. We demonstrate the effectiveness and robustness of PointSmile in downstream tasks including object classification and segmentation. Extensive results show that our PointSmile outperforms existing self-supervised methods, and compares favorably with popular fully-supervised methods on various standard architectures.

Abstract (translated)

点云处理中 self-supervised learning 受到广泛关注。然而,要获得点云的有用和可移植的特征,以在后续任务中进行高效的培训,由于它们的自然稀疏和不规则性,尚未得到充分解决。我们提出了 PointSmile,一种通过最大化点云对象之间的课程互信息(CMI)来无重构地学习范式,以模仿人类课程学习。从学习如何和什么开始,PointSmile 旨在模仿人类课程学习,即从容易的课程开始,逐渐提高其难度。为了解决“如何学习”,我们引入了点云课程增强(CDA)。CDA 鼓励 PointSmile 从容易样本到困难样本学习,从而使潜在空间能够动态地影响,以创建更好的嵌入。为了解决“什么学习”,我们提出了最大化特征和类之间的CMI,以更好地提取点云的有用特征。与大多数现有方法不同,PointSmile 不需要一个借口任务,也不要求跨modal数据以产生丰富的潜在表示。我们证明了 PointSmile 在后续任务,包括对象分类和分割中的有效性和鲁棒性。广泛的结果表明,我们的 PointSmile 优于现有的自监督方法,并与其他流行的完全监督方法在各种标准架构上进行比较。

URL

https://arxiv.org/abs/2301.12744

PDF

https://arxiv.org/pdf/2301.12744.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot