Abstract
Indoor scenes exhibit rich hierarchical structure in 3D object layouts. Many tasks in 3D scene understanding can benefit from reasoning jointly about the hierarchical context of a scene, and the identities of objects. We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation. We train our VDRAE on large-scale 3D scene datasets to predict both instance-level segmentations and a 3D object detections from an over-segmentation of an input point cloud. We show that our VDRAE improves object detection performance on real-world 3D point cloud datasets compared to baselines from prior work.
Abstract (translated)
室内场景在三维对象布局中呈现出丰富的层次结构。三维场景理解中的许多任务都可以从场景的层次上下文和对象身份的共同推理中受益。我们提出了一种变分去噪递归自动编码器(VDRAE),它生成并迭代地改进了三维对象布局的层次表示,交错自底向上编码用于上下文聚合和自上而下解码用于传播。我们在大规模3D场景数据集上训练我们的VDRAE,以预测实例级分段和来自输入点云过度分段的3D对象检测。我们表明,与先前工作的基线相比,我们的VDRAE提高了真实3D点云数据集上的目标检测性能。
URL
https://arxiv.org/abs/1903.03757