Abstract
Large Language Models (LLMs) have shown promise as decision-makers in dynamic settings, but their stateless nature necessitates creating a natural language representation of history. We present a unifying framework for systematically constructing natural language "state" representations for prompting LLM agents in repeated multi-agent games. Previous work on games with LLM agents has taken an ad hoc approach to encoding game history, which not only obscures the impact of state representation on agents' behavior, but also limits comparability between studies. Our framework addresses these gaps by characterizing methods of state representation along three axes: action informativeness (i.e., the extent to which the state representation captures actions played); reward informativeness (i.e., the extent to which the state representation describes rewards obtained); and prompting style (or natural language compression, i.e., the extent to which the full text history is summarized). We apply this framework to a dynamic selfish routing game, chosen because it admits a simple equilibrium both in theory and in human subject experiments \cite{rapoport_choice_2009}. Despite the game's relative simplicity, we find that there are key dependencies of LLM agent behavior on the natural language state representation. In particular, we observe that representations which provide agents with (1) summarized, rather than complete, natural language representations of past history; (2) information about regrets, rather than raw payoffs; and (3) limited information about others' actions lead to behavior that more closely matches game theoretic equilibrium predictions, and with more stable game play by the agents. By contrast, other representations can exhibit either large deviations from equilibrium, higher variation in dynamic game play over time, or both.
Abstract (translated)
大型语言模型(LLMs)在动态环境中作为决策者表现出潜力,但它们无状态的特性需要创建自然语言的历史表示。我们提出了一种统一框架,用于系统地构建自然语言“状态”表示,以提示重复多代理游戏中的LLM代理。之前关于使用LLM代理的游戏的工作采用了临时编码游戏历史的方法,这不仅模糊了状态表示对代理行为的影响,还限制了不同研究之间的可比性。我们的框架通过沿三个维度来表征状态表示方法:行动信息量(即,状态表示捕获所玩动作的程度);奖励信息量(即,状态表示描述获得的报酬的程度);以及提示风格(或自然语言压缩,即完整文本历史被总结的程度),解决了这些差距。 我们将此框架应用于动态自私路由游戏,选择该游戏是因为它在理论和人类主体实验中都允许简单的均衡\cite{rapoport_choice_2009}。尽管该游戏相对简单,但我们发现LLM代理行为的关键依赖性在于自然语言状态表示。具体而言,我们观察到提供给代理的以下三种类型的表示会导致其行为更接近博弈论中的均衡预测,并且与代理动态游戏玩法更加稳定:(1) 提供总结过的,而不是完整的自然语言过去历史;(2) 有关遗憾的信息,而非原始收益信息;以及 (3) 对他人行动有限的信息。 相比之下,其他表示可能会表现出偏离均衡的大偏差、时间变化中动态博弈行为的更高变异度,或者两者兼有。
URL
https://arxiv.org/abs/2506.15624