Abstract
AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. The project website can be found at this https URL.
Abstract (translated)
AI agents通常通过训练大量的人类行为数据来进行共同学习。然而,并不是所有的行为都是安全和有用的。期望的AI代理特征可以通过分配吸引力分数来表达,我们假设这些分数不是分配给单个行为的,而是分配给集体轨迹。例如,在车辆互动数据集中,这些分数可能与发生的事故数量有关。首先,我们评估每个单独代理行为对集体吸引力分数的影响,例如评估一个代理引起事故的可能性。这使我们能够选择性地模仿具有积极影响效应的代理,例如,只模仿不太可能引起事故的代理。为了实现这一目标,我们提出了代理的交换价值概念,该概念衡量了一个代理对集体吸引力分数的贡献。交换价值是用随机选择一个代理来代替该代理时,期望的吸引力分数的变化。我们还提出了从现实世界数据中估计交换价值的方法,使我们能够学习具有优异表现的相关基线之外的需求模仿策略。项目网站可以在此处找到:https://www.project-url.com/
URL
https://arxiv.org/abs/2405.03735