Polycraft World AI Lab : An Extensible Platform for Evaluating Artificial Intelligence Agents

2023-01-27 18:08:04

Stephen A. Goss, Robert J. Steininger, Dhruv Narayanan, Daniel V. Olivença, Yutong Sun, Peng Qiu, Jim Amato, Eberhard O. Voit, Walter E. Voit, Eric J. Kildebeck

arXiv_AI

arXiv_AI Action Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.

Abstract (translated)

随着人工智能研究的推进,用于评估人工智能代理的平台需要适应和增长,以继续挑战它们。我们介绍了Polycraft World AI Lab(PAL),这是一个任务模拟器,基于Minecraft mod Polycraft World的API构建。我们的平台旨在使具有不同架构的人工智能代理容易与Minecraft世界互动、训练并在多个任务中进行评估。PAL能够以灵活的方式创建任务,并在评估期间能够操纵任务的任何方面。在开放世界环境中,人工智能代理和外部行为体(非玩家角色,NPCs)采取的所有行动都记录了以简化评估。在这里,我们介绍了两个自定义任务,一个专注于多步规划,另一个专注于导航,以及评估人工智能代理解决这些问题的任务。总之,我们报告了一个多功能、可扩展的AI评估平台,使AI研究人员能够轻松地利用。

URL

https://arxiv.org/abs/2301.11891

PDF

https://arxiv.org/pdf/2301.11891.pdf