Paper Reading AI Learner

Instruction-Driven Game Engines on Large Language Models

2024-03-30 08:02:16
Hongqiu Wu, Yan Wang, Xingyuan Liu, Hai Zhao, Min Zhang

Abstract

The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game rules and autonomously generate game-play processes. The IDGE allows users to create games by issuing simple natural language instructions, which significantly lowers the barrier for game development. We approach the learning process for IDGEs as a Next State Prediction task, wherein the model autoregressively predicts in-game states given player actions. It is a challenging task because the computation of in-game states must be precise; otherwise, slight errors could disrupt the game-play. To address this, we train the IDGE in a curriculum manner that progressively increases the model's exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, a universally cherished card game. The engine we've designed not only supports a wide range of poker variants but also allows for high customization of rules through natural language inputs. Furthermore, it also favors rapid prototyping of new games from minimal samples, proposing an innovative paradigm in game development that relies on minimal prompt and data engineering. This work lays the groundwork for future advancements in instruction-driven game creation, potentially transforming how games are designed and played.

Abstract (translated)

指令式游戏引擎(IDGE)项目旨在通过让大型语言模型(LLM)能够遵循自由形式游戏规则并自主生成游戏进程,从而民主化游戏开发。用户可以通过发出简单的自然语言指令来创建游戏,这大大降低了游戏开发的门槛。我们将IDGE的学习过程视为下一状态预测任务,其中模型自回归地预测给玩家动作的游戏状态。这是一个具有挑战性的任务,因为游戏状态的计算必须准确;否则,微小的错误可能会干扰游戏进程。为了应对这个问题,我们在 curriculum 方式下对IDGE进行训练,逐渐增加模型对复杂场景的暴露。我们在 initial progress 部分开发了一个 IDGE for Poker,这是一个被全球玩家珍视的扑克游戏。我们所设计的引擎不仅能支持各种扑克变种,还允许用户通过自然语言输入来高度定制规则。此外,它还鼓励快速原型设计新游戏,提出了一个游戏开发 paradigm 依赖于最小提示和数据工程的创新模式。这项工作为未来在指令式游戏创作方面的进一步发展奠定了基础,有可能改变游戏设计及其玩法。

URL

https://arxiv.org/abs/2404.00276

PDF

https://arxiv.org/pdf/2404.00276.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot