Transformers as Policies for Variable Action Environments

Abstract
Abstract (translated)
URL
PDF

Abstract

In this project we demonstrate the effectiveness of the transformer encoder as a viable architecture for policies in variable action environments. Using it, we train an agent using Proximal Policy Optimisation (PPO) on multiple maps against scripted opponents in the Gym-$\mu$RTS environment. The final agent is able to achieve a higher return using half the computational resources of the next-best RL agent, which used the GridNet architecture. The source code and pre-trained models are available here: this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2301.03679

PDF

https://arxiv.org/pdf/2301.03679.pdf