Paper Reading AI Learner

Polycraft World AI Lab : An Extensible Platform for Evaluating Artificial Intelligence Agents

2023-01-27 18:08:04
Stephen A. Goss, Robert J. Steininger, Dhruv Narayanan, Daniel V. Olivença, Yutong Sun, Peng Qiu, Jim Amato, Eberhard O. Voit, Walter E. Voit, Eric J. Kildebeck

Abstract

As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.

Abstract (translated)

随着人工智能研究的推进,用于评估人工智能代理的平台需要适应和增长,以继续挑战它们。我们介绍了Polycraft World AI Lab(PAL),这是一个任务模拟器,基于Minecraft mod Polycraft World的API构建。我们的平台旨在使具有不同架构的人工智能代理容易与Minecraft世界互动、训练并在多个任务中进行评估。PAL能够以灵活的方式创建任务,并在评估期间能够操纵任务的任何方面。在开放世界环境中,人工智能代理和外部行为体(非玩家角色,NPCs)采取的所有行动都记录了以简化评估。在这里,我们介绍了两个自定义任务,一个专注于多步规划,另一个专注于导航,以及评估人工智能代理解决这些问题的任务。总之,我们报告了一个多功能、可扩展的AI评估平台,使AI研究人员能够轻松地利用。

URL

https://arxiv.org/abs/2301.11891

PDF

https://arxiv.org/pdf/2301.11891.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot