Parametrically Retargetable Decision-Makers Tend To Seek Power

2022-06-27 17:39:23

Alexander Matt Turner, Prasad Tadepalli

arXiv_AI

arXiv_AI Pose Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive. However, the real world is neither fully observable, nor will agents be perfectly optimal. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We discover that many decision-making functions are retargetable, and that retargetability is sufficient to cause power-seeking tendencies. Our functional criterion is simple and broad. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, highly retargetable training procedures may train real-world agents which seek power over humans.

Abstract (translated)

URL

https://arxiv.org/abs/2206.13477

PDF

https://arxiv.org/pdf/2206.13477.pdf