Abstract
In the field of robotics and computer vision, efficient and accurate semantic mapping remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic mapping methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Mapping (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic mapping techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.
Abstract (translated)
在机器人学和计算机视觉领域,有效的语义映射由于对能够理解和与复杂环境交互的智能机器的需求不断增加而仍然是一个重要的挑战。然而,传统的全景映射方法却受到预定义语义类别的限制,因此对于处理新颖或未曾预料到的事物来说,它们的有效性就有限了。为了应对这个局限性,我们引入了统一可提示的全景映射(UPPM)方法。UPPM利用最近在基础模型上的进展,通过自然语言提示实现实时、按需标签生成。通过将动态标签策略融入传统全景映射技术中,UPPM在保持高地图重建性能的同时,显著提高了其适应性和多样性。我们在真实世界和模拟数据集上验证了我们的方法。结果表明,UPPM在生成自然语言交互下的准确场景 reconstructs 和对象 segmentation的同时,提供了通过自然语言交互生成丰富语义标签的优势。一系列消融实验证实了基于基础模型 的标签集在固定标签集上的优势。
URL
https://arxiv.org/abs/2405.02162