Language-Aligned Waypoint Supervision for Vision-and-Language Navigation in Continuous Environments

2021-09-30 15:28:24

Sonia Raychaudhuri, Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang

arXiv_CV

arXiv_CV Pose Action 3D Embodied Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference path. Prior work supervises the agent with actions based on the shortest path from the agent's location to the goal, but such goal-oriented supervision is often not in alignment with the instruction. Furthermore, the evaluation metrics employed by prior work do not measure how much of a language instruction the agent is able to follow. In this work, we propose a simple and effective language-aligned supervision scheme, and a new metric that measures the number of sub-instructions the agent has completed during navigation.

Abstract (translated)

URL

https://arxiv.org/abs/2109.15207

PDF

https://arxiv.org/pdf/2109.15207.pdf