Abstract
Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $\epsilon$-optimal policy with a tight sample complexity of $O(1/\epsilon^2)$.
Abstract (translated)
信息不对称是多代理系统中普遍存在的一种特征,尤其在经济学和社会科学领域表现得尤为明显。在这种背景下,各主体根据私有信息调整行为以最大化自身收益。这种策略性行为往往由于混淆变量的引入而变得复杂。同时,在目标环境中进行实验的难度也带来了知识迁移的重大挑战,这需要将知识从数据更容易获取的环境转移到其他场景中。在此背景下,本文探讨了在线学习中的一个基本问题:我们能否利用非独立同分布(non-i.i.d.)的动作来了解混淆变量,即使在这种情况下仍需实现知识迁移?为此,我们提出了一种样本效率高的算法,旨在准确识别信息不对称条件下的系统动态,并在强化学习框架内有效应对知识转移的挑战,在一个在线策略互动模型下进行。我们的方法可以证明,能够在具有紧致样本复杂度$O(1/\epsilon^2)$的情况下,实现$\epsilon$-最优策略的学习。
URL
https://arxiv.org/abs/2506.09940