Paper Reading AI Learner

3D-SIC: 3D Semantic Instance Completion for RGB-D Scans

2019-04-26 18:33:34
Ji Hou, Angela Dai, Matthias Nießner


This paper focuses on the task of semantic instance completion: from an incomplete, RGB-D scan of a scene, we aim to detect the individual object instances comprising the scene and jointly infer their complete object geometry. This enables a semantically meaningful decomposition of a scanned scene into individual, complete 3D objects. This semantic instance completion of a 3D scene opens up many new possibilities in enabling meaningful interactions with a scene, for instance for virtual or robotic agents. Rather than considering 3D semantic instance segmentation and scan completion separately, we propose 3D-SIC, a new end-to-end 3D convolutional neural network which jointly learns to detect object instances and predict their complete geometry -- achieving significantly better performance than treating these tasks independently. 3D-SIC leverages joint color-geometry feature learning and a fully-convolutional 3D network to effectively infer semantic instance completion for 3D scans at scale. Our method runs at interactive rates, taking several seconds inference time on scenes of $30$m $\times$ $25$m spatial extent. For the task of semantic instance completion, we additionally introduce a new semantic instance completion benchmark on real scan data, where we outperform alternative approaches by over 15 in mAP@0.5.

Abstract (translated)

本文主要研究语义实例完成的任务:从一个场景的不完全的RGB-D扫描,我们旨在检测包含该场景的单个对象实例,并共同推断它们的完整对象几何。这使得一个有语义意义的扫描场景分解成单个,完整的三维对象。3D场景的语义实例完成为与场景进行有意义的交互提供了许多新的可能性,例如虚拟或机器人代理。我们没有单独考虑3D语义实例分割和扫描完成,而是提出了一种新的端到端3D卷积神经网络3d-sic,它共同学习检测对象实例并预测其完整的几何结构——比独立处理这些任务显著提高性能。3d-sic利用联合颜色几何特征学习和完全卷积的3d网络有效地推断三维扫描的语义实例完成。我们的方法以交互速率运行,在30$M$ imes$25$M空间范围的场景上花费几秒钟的推理时间。对于语义实例完成的任务,我们还引入了一个新的基于真实扫描数据的语义实例完成基准,在map@0.5中,我们的性能优于其他方法超过15个。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot