Abstract
The generalization of the end-to-end deep reinforcement learning (DRL) for object-goal visual navigation is a long-standing challenge since object classes and placements vary in new test environments. Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects. In this letter, a target-directed attention network (TDANet) is proposed to learn the end-to-end object-goal visual navigation policy with zero-shot ability. TDANet features a novel target attention (TA) module that learns both the spatial and semantic relationships among objects to help TDANet focus on the most relevant observed objects to the target. With the Siamese architecture (SA) design, TDANet distinguishes the difference between the current and target states and generates the domain-independent visual representation. To evaluate the navigation performance of TDANet, extensive experiments are conducted in the AI2-THOR embodied AI environment. The simulation results demonstrate a strong generalization ability of TDANet to unseen scenes and target objects, with higher navigation success rate (SR) and success weighted by length (SPL) than other state-of-the-art models.
Abstract (translated)
端到端深度强化学习(DRL)在物体目标视觉导航中的泛化是一个长期以来的挑战,因为不同的测试环境中的物体类和放置位置不同。学习领域无关的视觉表示对于让训练后的DRL智能体具有泛化到未见过的场景和物体的能力至关重要。在本文中,提出了一种目标定向注意力网络(TDANet),用于学习具有零散射击能力的端到端物体目标视觉导航策略。TDANet具有一个新颖的目标注意力(TA)模块,学习物体之间的空间和语义关系,以帮助TDANet关注目标中最相关的观测物体。通过Siamese架构(SA)设计,TDANet区分了当前状态和目标状态,并生成领域无关的视觉表示。为了评估TDANet的导航性能,在AI2-THOR embodied AI环境中进行了广泛的实验。模拟结果表明,TDANet对未见过的场景和目标物体的泛化能力很强,具有比其他最先进的模型更高的导航成功率(SR)和成功加权长度(SPL)。
URL
https://arxiv.org/abs/2404.08353