Abstract
Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in a top-down view. Instead of relying on narrow-FoV image sequences, a panoramic image with depth information is sufficient to generate a holistic BEV semantic map. To benchmark 360BEV, we present two indoor datasets, 360BEV-Matterport and 360BEV-Stanford, both of which include egocentric panoramic images and semantic segmentation labels, as well as allocentric semantic maps. Besides delving deep into different mapping paradigms, we propose a dedicated solution for panoramic semantic mapping, namely 360Mapper. Through extensive experiments, our methods achieve 44.32% and 45.78% in mIoU on both datasets respectively, surpassing previous counterparts with gains of +7.60% and +9.70% in mIoU. Code and datasets will be available at: \url{this https URL}.
Abstract (translated)
只看到整体的一部分并不了解完整的情况。 bird's-eye-view (BEV) 感知是从自我中心视角获取外置坐标系地图的过程,当仅使用狭窄的的视角( FoV)时受到限制。在本研究中,将 360° 全景映射到 BEV 语义映射作为首次尝试,以从上往下的方式呈现室内场景的全体表示。不再依赖狭窄的 FoV 图像序列,一个具有深度信息的全景图像就足够了生成整体 BEV 语义地图。为了比较 360BEV,我们提供了两个室内数据集:360BEV- Matterport 和 360BEV-Stanford,其中包含自我中心全景图像和语义分割标签,以及外置坐标系语义地图。除了深入研究不同的映射范式外,我们提出了全景语义映射的专门解决方案,即 360Mapper。通过广泛的实验,我们的方法在两个数据集上的 mIoU 分别达到 44.32% 和 45.78%,超越了之前的对应方法,在 mIoU 上获得了 7.60% 和 9.70% 的提升。代码和数据集将发表于 \url{this https URL}。
URL
https://arxiv.org/abs/2303.11910