Learning to View: Decision Transformers for Active Object Detection

Abstract
Abstract (translated)
URL
PDF

Abstract

Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve the results if we allow planning to consume detection signals and move the robot to collect views that maximize the quality of the results. In this paper, we use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. Specifically, we propose using a Decision Transformer with online fine-tuning, which first optimizes the policy with a pre-collected expert dataset and then improves the learned policy by exploring better solutions in the environment. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator. Experimental results demonstrate that our method outperforms all baselines, including expert policy and pure offline RL methods. We also provide exhaustive analyses of the reward distribution and observation space.

Abstract (translated)

主动感知是指一类技术,将它们的计划和感知系统结合起来,使机器人以某种方式向机器人提供更多关于环境的信息。在大多数机器人系统中,感知通常独立于运动计划。例如,传统的物体检测是被动的:它只在接收到的图像中进行操作。然而,如果我们允许计划消耗检测信号并移动机器人以收集最佳结果的图像,我们有机会改善结果。在本文中,我们使用强化学习(RL)方法来控制机器人,以获得最佳检测质量的图像。具体而言,我们提议使用在线微调的决策Transformer,该方法首先优化在与预先收集的专家数据集上训练的政策,然后通过在环境中探索更好的解决方案来提高学习的政策。我们评估了 proposed 方法在从室内情景模拟收集的交互数据集上的性能。实验结果显示,我们的方法比所有基准方法都表现出色,包括专家政策和纯粹的离线强化学习方法。我们还提供了奖励分布和观察空间的详细分析。

URL

https://arxiv.org/abs/2301.09544

PDF

https://arxiv.org/pdf/2301.09544.pdf