Paper Reading AI Learner

Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks

2025-01-07 14:36:33
Weilong Yang, Jie Zhang, Xunyun Liu, Yanqing Ye

Abstract

Effective evaluation of real-time strategy tasks requires adaptive mechanisms to cope with dynamic and unpredictable environments. This study proposes a method to improve evaluation functions for real-time responsiveness to battle-field situation changes, utilizing an online reinforcement learning-based dynam-ic weight adjustment mechanism within the real-time strategy game. Building on traditional static evaluation functions, the method employs gradient descent in online reinforcement learning to update weights dynamically, incorporating weight decay techniques to ensure stability. Additionally, the AdamW optimizer is integrated to adjust the learning rate and decay rate of online reinforcement learning in real time, further reducing the dependency on manual parameter tun-ing. Round-robin competition experiments demonstrate that this method signifi-cantly enhances the application effectiveness of the Lanchester combat model evaluation function, Simple evaluation function, and Simple Sqrt evaluation function in planning algorithms including IDABCD, IDRTMinimax, and Port-folio AI. The method achieves a notable improvement in scores, with the en-hancement becoming more pronounced as the map size increases. Furthermore, the increase in evaluation function computation time induced by this method is kept below 6% for all evaluation functions and planning algorithms. The pro-posed dynamic adaptive evaluation function demonstrates a promising approach for real-time strategy task evaluation.

Abstract (translated)

实时策略任务的有效评估需要具备适应机制以应对动态和不可预测的环境。本研究提出了一种方法,通过在实时策略游戏中使用基于在线强化学习的动力权重调整机制来改进评价函数,从而增强其对战场情况变化的实时响应能力。在此基础上,该方法利用在线强化学习中的梯度下降技术动态更新权重,并采用权重衰减技术以确保稳定性。此外,还整合了AdamW优化器,在线实时调整强化学习的学习率和衰减率,进一步减少了对手动参数调优的依赖。 通过轮次竞争实验显示,这种方法显著提高了兰彻斯特战斗模型评价函数、简单评价函数及简单平方根评价函数在包括IDABCD(迭代深入搜索与贝叶斯决策)、IDRTMinimax(迭代深入搜索与反向蒙特卡洛树搜索)以及组合AI在内的规划算法中的应用效果。该方法实现了显著的评分提升,随着地图规模的增大,改进幅度更加明显。同时,由此引发的评价函数计算时间增加在所有评价函数和规划算法中均低于6%。 所提出的动态自适应评价函数为实时策略任务评估展示了一种有前景的方法。

URL

https://arxiv.org/abs/2501.03824

PDF

https://arxiv.org/pdf/2501.03824.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot