Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots

Abstract
Abstract (translated)
URL
PDF

Abstract

With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision. The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents. The unique dataset provides an opportunity to develop and benchmark new visual prediction frameworks that can predict future image frames based on the action taken by the recording agent in partially observable scenarios or cases where the imaging sensor is mounted on a moving platform. We have benchmarked the dataset on our novel deep visual prediction framework called ACPNet where the approximated future image frames are also conditioned on action taken by the robot and demonstrated its potential for incorporating robot dynamics into the video prediction paradigm for mobile robotics and autonomous navigation research.

Abstract (translated)

随着各行各业对机器人的广泛应用,发展先进的算法是至关重要的。我们介绍了机器人自主运动(RoAM)视频数据集,该数据集使用定制的turtlebot3 Burger机器人在多种室内环境中录制从机器人自我意识Vision角度的各种人类运动。数据集还包括同步记录的激光扫描和机器人在静态和动态人类代理周围导航时采取的所有控制行动。该独特的数据集提供了一个机会,开发和基准新的视觉预测框架,可以在可观察场景或图像传感器安装在移动平台上时基于记录设备的采取行动预测未来图像帧。我们基准了该数据集在我们 novel 的深层视觉预测框架 ACPNet 上,其中预测的未来图像帧也取决于机器人采取的行动,并展示了将其机器人动力学纳入移动设备机器人和自主导航研究的视频预测范式的潜力。

URL

https://arxiv.org/abs/2306.15852

PDF

https://arxiv.org/pdf/2306.15852.pdf

Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots

Abstract

Abstract (translated)

URL

PDF Copy

PDF