Abstract
As a preliminary work, NeRF-Det unifies the tasks of novel view synthesis and 3D perception, demonstrating that perceptual tasks can benefit from novel view synthesis methods like NeRF, significantly improving the performance of indoor multi-view 3D object detection. Using the geometry MLP of NeRF to direct the attention of detection head to crucial parts and incorporating self-supervised loss from novel view rendering contribute to the achieved improvement. To better leverage the notable advantages of the continuous representation through neural rendering in space, we introduce a novel 3D perception network structure, NeRF-DetS. The key component of NeRF-DetS is the Multi-level Sampling-Adaptive Network, making the sampling process adaptively from coarse to fine. Also, we propose a superior multi-view information fusion method, known as Multi-head Weighted Fusion. This fusion approach efficiently addresses the challenge of losing multi-view information when using arithmetic mean, while keeping low computational costs. NeRF-DetS outperforms competitive NeRF-Det on the ScanNetV2 dataset, by achieving +5.02% and +5.92% improvement in mAP@.25 and mAP@.50, respectively.
Abstract (translated)
作为初步工作,NeRF-Det 统一了 novel view synthesis 和 3D 感知任务,证明了 NeRF 这样的感知任务可以通过 novel view synthesis 方法受益,显著提高了室内多视图 3D 物体检测的性能。利用 NeRF 的几何 MLP 指导检测头的注意力,并将来自 novel view 渲染的自监督损失融入其中,有助于实现所取得的改进。为了更好地利用连续空间表示中的显著优势,我们在 NeRF-Det 上引入了一个新的 3D 感知网络结构 NeRF-DetS。NeRF-DetS 的关键组件是 Multi-level Sampling-Adaptive Network,使抽样过程从粗到细进行自适应。此外,我们提出了一个更好的多视图信息融合方法,称为 Multi-head Weighted Fusion。这种融合方法有效地解决了使用算术平均值时丢失多视图信息的问题,同时保持较低的计算成本。在 ScanNetV2 数据集上,NeRF-DetS 超越了竞争 NeRF-Det,实现了 +5.02% 和 +5.92% 的 mAP@.25 和 mAP@.50 改善。
URL
https://arxiv.org/abs/2404.13921