Paper Reading AI Learner

Learning from Mistakes: Self-Regularizing Hierarchical Semantic Representations in Point Cloud Segmentation

2023-01-26 14:52:30
Elena Camuffo, Umberto Michieli, Simone Milani

Abstract

Recent advances in autonomous robotic technologies have highlighted the growing need for precise environmental analysis. LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding by acting directly on raw content provided by sensors. Recent solutions showed how different learning techniques can be used to improve the performance of the model, without any architectural or dataset change. Following this trend, we present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model. First, classes are clustered into macro groups according to mutual prediction errors; then, the learning process is regularized by: (1) aligning class-conditional prototypical feature representation for both fine and coarse classes, (2) weighting instances with a per-class fairness index. Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture; indeed, experimental results showed that it enables state-of-the-art performances on different architectures, datasets and tasks, while ensuring more balanced class-wise results and faster convergence.

Abstract (translated)

最近的自主机器人技术的进步强调了精确环境分析的需求日益增加。激光雷达语义分割受到关注,以通过直接对传感器提供的原始内容进行操作实现精细场景理解。最近的解决方案展示了如何使用不同学习技术来改善模型的性能,而无需改变架构或数据集。遵循这一趋势,我们提出了一种粗到细的setup,该setup基于一个标准模型的分类 mistaKes(LEAK)。首先,根据相互预测误差将 classes Clustered into macro groups。然后,学习过程被 regularized by:(1)将 Fine and Coarser classes的类条件典型特征表示对齐,(2)对每个 class 加权并使用一个每个 class 公平指数。我们的LEAK方法非常通用,可以无缝应用于任何分割架构之上;事实上,实验结果表明,它可以实现不同架构、数据集和任务最先进的性能,同时确保更平衡的每个 class 的结果和更快的收敛。

URL

https://arxiv.org/abs/2301.11145

PDF

https://arxiv.org/pdf/2301.11145.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot