Model-based Offline Quantum Reinforcement Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.

Abstract (translated)

本文提出了基于模型的离线量子强化学习的第一种算法，并在 cart-pole 基准上证明了其功能。模型和要优化的策略都被实现为变分量子电路。通过使用模型给出的返回估计作为 fitness 函数，策略通过梯度无关优化方案进行优化。基于模型的方法在优化阶段原则上允许实现完整的量子计算机，并为尽快拥有足够强大的量子计算机带来了希望。

URL

https://arxiv.org/abs/2404.10017

PDF

https://arxiv.org/pdf/2404.10017.pdf

Model-based Offline Quantum Reinforcement Learning

Abstract

Abstract (translated)

URL

PDF Copy

PDF