Deep Hough Voting for 3D Object Detection in Point Clouds

Abstract
Abstract (translated)
URL
PDF

Abstract

Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -- samples from 2D manifolds in 3D space -- we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.

Abstract (translated)

当前的三维物体检测方法受到二维探测器的严重影响。为了利用二维探测器中的结构，它们通常将三维点云转换为常规网格（即，转换为体素网格或鸟瞰图图像），或者依靠二维图像中的检测来提出三维框。很少有研究试图直接探测点云中的物体。在这项工作中，我们回到第一个原则，为点云数据和尽可能通用的三维检测管道。然而，由于数据的稀疏性——来自三维空间中二维流形的样本——我们在直接从场景点预测边界框参数时面临一个主要挑战：三维对象的质心可能远离任何曲面点，因此很难在一个步骤中精确回归。为了解决这一挑战，我们提出了一种基于深度点集网络和Hough投票协同的端到端三维目标检测网络Votenet。我们的模型通过简单的设计、紧凑的模型尺寸和高效的效率，在两个大型数据集上实现了最先进的3D检测，包括扫描和SUN RGB-D。值得注意的是，Votenet通过使用纯几何信息而不依赖彩色图像，优于以前的方法。

URL

https://arxiv.org/abs/1904.09664

PDF

https://arxiv.org/pdf/1904.09664.pdf