Abstract
Many real-world tasks, from house-cleaning to cooking, can be formulated as multi-object rearrangement problems -- where an agent needs to get specific objects into appropriate goal states. For such problems, we focus on the setting that assumes a pre-specified goal state, availability of perfect manipulation and object recognition capabilities, and a static map of the environment but unknown initial location of objects to be rearranged. Our goal is to enable home-assistive intelligent agents to efficiently plan for rearrangement under such partial observability. This requires efficient trade-offs between exploration of the environment and planning for rearrangement, which is challenging because of long-horizon nature of the problem. To make progress on this problem, we first analyze the effects of various factors such as number of objects and receptacles, agent carrying capacity, environment layouts etc. on exploration and planning for rearrangement using classical methods. We then investigate both monolithic and modular deep reinforcement learning (DRL) methods for planning in our setting. We find that monolithic DRL methods do not succeed at long-horizon planning needed for multi-object rearrangement. Instead, modular greedy approaches surprisingly perform reasonably well and emerge as competitive baselines for planning with partial observability in multi-object rearrangement problems. We also show that our greedy modular agents are empirically optimal when the objects that need to be rearranged are uniformly distributed in the environment -- thereby contributing baselines with strong performance for future work on multi-object rearrangement planning in partially observable settings.
Abstract (translated)
许多现实世界的任务,从家务到烹饪,都可以表述为多物体重新排列问题 - 在这种情况下,一个代理需要将特定的对象进入适当的目标状态。针对这些问题,我们关注假设预定目标状态、拥有完美的操纵和对象识别能力、环境静态地图,但未知初始位置的目标对象重新排列设置。我们的目标是使家居协助智能代理能够在这种部分可见性下高效地规划重新排列。这需要在探索环境与规划重新排列之间的高效权衡,因为这个问题的时间跨度很长,因此具有挑战性。为了解决这个问题,我们首先分析了各种因素,如物体和容器的数量、代理承载能力、环境布局等,使用经典方法探索和规划重新排列。然后我们研究了在我们这个环境中的整块和模块深度强化学习(DRL)方法的规划和。我们发现整块DRL方法在需要多物体重新排列的长期规划方面并没有成功。相反,模块贪心方法意外地表现良好,并成为在多物体重新排列问题中部分可见性规划具有竞争力基准。我们还表明,在我们需要均匀分布需要重新排列的对象的环境中,我们的贪心模块代理是经验上的最优解,因此为未来在部分可见性条件下的多物体重新排列规划工作提供了强大的基准。
URL
https://arxiv.org/abs/2301.09854