Paper Reading AI Learner

From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds

2023-01-31 18:58:41
Huan-ang Gao, Beiwen Tian, Pengfei Li, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yurong Chen, Hongbin Zha

Abstract

Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, layout estimation using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this task based upon the idea of model exponential moving averaging. But adapting this scheme to the state-of-the-art (SOTA) solution for PC-based layout estimation is not straightforward. To this end, we define a quad set matching strategy and several consistency losses based upon metrics tailored for layout quads. Besides, we propose a new online pseudo-label harvesting algorithm that decomposes the distribution of a hybrid distance measure between quads and PC into two components. This technique does not need manual threshold selection and intuitively encourages quads to align with reliable layout points. Surprisingly, this framework also works for the fully-supervised setting, achieving a new SOTA on the ScanNet benchmark. Last but not least, we also push the semi-supervised setting to the realistic omni-supervised setting, demonstrating significantly promoted performance on a newly annotated ARKitScenes testing set. Our codes, data and models are released in this repository.

Abstract (translated)

房间布局估计是一个长期存在的机器人视觉任务,既受益于环境感知又受益于运动规划。然而,使用点云(PC)进行布局估计仍然由于标注困难导致数据匮乏。因此,我们基于模型指数移动平均的想法解决了这个任务的半监督设置。但是,将这个方案适应为基于PC-based布局估计的最新(最优)解决方案并不是那么简单。为此,我们定义了一个适合布局 quad 的 metrics 并计算了几个一致性损失,基于这些 metrics 制定了一个新的关系型标记收集算法。这个算法将混合距离测量在 quad 和 PC 之间的分布分解为两个组件。这个技术不需要手动阈值选择,直觉地鼓励 quads 与可靠的布局点对齐。令人惊讶地,这个框架还适用于全监督设置,在扫描Net 基准测试集上实现了新的最优解决方案。最后但同样重要的是,我们也将半监督设置推向现实的多监督设置,在一个新注释的 ARKitScenes 测试集中展示了显著的性能提升。我们的代码、数据和模型在此仓库中发布。

URL

https://arxiv.org/abs/2301.13865

PDF

https://arxiv.org/pdf/2301.13865.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot