Towards the Scalable Evaluation of Cooperativeness in Language Models

Abstract
Abstract (translated)
URL
PDF

Abstract

It is likely that AI systems driven by pre-trained language models (PLMs) will increasingly be used to assist humans in high-stakes interactions with other agents, such as negotiation or conflict resolution. Consistent with the goals of Cooperative AI \citep{dafoe_open_2020}, we wish to understand and shape the multi-agent behaviors of PLMs in a pro-social manner. An important first step is the evaluation of model behaviour across diverse cooperation problems. Since desired behaviour in an interaction depends upon precise game-theoretic structure, we focus on generating scenarios with particular structures with both crowdworkers and a language model. Our work proceeds as follows. First, we discuss key methodological issues in the generation of scenarios corresponding to particular game-theoretic structures. Second, we employ both crowdworkers and a language model to generate such scenarios. We find that the quality of generations tends to be mediocre in both cases. We additionally get both crowdworkers and a language model to judge whether given scenarios align with their intended game-theoretic structure, finding mixed results depending on the game. Third, we provide a dataset of scenario based on our data generated. We provide both quantitative and qualitative evaluations of UnifiedQA and GPT-3 on this dataset. We find that instruct-tuned models tend to act in a way that could be perceived as cooperative when scaled up, while other models seemed to have flat scaling trends.

Abstract (translated)

可能的是，基于预训练语言模型(PLMs)驱动的人工智能系统将 increasingly 被用来协助人类与其他agent之间的高级别的交互，例如谈判或冲突解决。与合作人工智能(Cooperative AI)的目标相一致，我们希望理解并塑造PLMs的多方行为，以 pro-social 的方式影响它们。一个重要的步骤是评估不同合作问题的模型行为。由于在交互中期望的行为取决于精确的博弈论结构，我们重点处理生成具有特定结构的情境，同时雇用群众演员和语言模型。我们的工作按照以下步骤进行：首先，我们讨论了生成特定博弈论结构的方法和关键方法论问题。其次，我们使用群众演员和语言模型生成这样的情境。我们发现，在两个情况下，生成的质量都相对较低。我们还让群众演员和语言模型判断给定情境是否与它们的预期的博弈论结构对齐，发现根据游戏结果会出现不同结果。第三，我们提供了基于我们生成的数据的情境数据集。我们在这个数据集中提供了 UnifiedQA 和 GPT-3 的定量和定性评估。我们发现，经过调整的模型往往会在扩大规模时表现出可以被视为合作的方式，而其他模型似乎呈现出平增长趋势。

URL

https://arxiv.org/abs/2303.13360

PDF

https://arxiv.org/pdf/2303.13360.pdf