Abstract
This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.
Abstract (translated)
本文提出了基于模型的离线量子强化学习的第一种算法,并在 cart-pole 基准上证明了其功能。模型和要优化的策略都被实现为变分量子电路。通过使用模型给出的返回估计作为 fitness 函数,策略通过梯度无关优化方案进行优化。基于模型的方法在优化阶段原则上允许实现完整的量子计算机,并为尽快拥有足够强大的量子计算机带来了希望。
URL
https://arxiv.org/abs/2404.10017