DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

2024-02-29 10:03:57
Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu


Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancy from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset could be found at this https URL.

Abstract (translated)

零距离物体导航(ZSON)要求智能体在未知环境中自主定位和靠近未见到的物体,这一任务在 embodied AI 领域已成为一个特别具有挑战性的任务。现有的用于开发 ZSON 算法的数据集中没有考虑到动态障碍、物体属性的多样性和场景文本,因此与现实世界情况存在明显的差异。为了解决这些问题,我们提出了一个用于开放词汇零距离物体导航在动态环境中的数据集(DOZE),它包括十个高保真的 3D 场景,超过 18k 个任务,旨在模拟复杂、动态的现实生活中场景。 具体来说,DOZE 场景特征有多名移动的人形障碍物、各种开放词汇物体、多样化的独特属性物体和有价值的文本提示。此外,与现有的仅提供代理与静态障碍物之间碰撞检查的数据集不同,我们通过整合检测代理与移动障碍物之间碰撞的功能来增强 DOZE。这种新功能使得可以在动态环境中评估代理的避障能力。我们在 DOZE 上测试了四种代表性的 ZSON 方法,揭示了现有方法在导航效率、安全性和物体识别准确性方面存在巨大的改进空间。我们的数据集可以在这个链接中找到:



