Abstract
Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neuralmodels, where it is often difficult to incorporate expert knowledge into the models or let experts review andvalidate the learned decision mechanisms. Knowledge-insertion and model review are important requirements inmany applications involving human health and safety. One way to bridge the gap between data and knowledgedriven systems is program synthesis: replacing a neural network that outputs decisions with one that generatesdecision-making code in some programming language. We propose a new programming language, BF++,designed specifically for neural program synthesis in a Partially Observable Markov Decision Process (POMDP)setting and generate programs for a number of standard OpenAI Gym benchmarks.
Abstract (translated)
URL
https://arxiv.org/abs/2101.09571