Paper Reading AI Learner

Select to Perfect: Imitating desired behavior from large multi-agent data

2024-05-06 15:48:24
Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Foerster, Joao Henriques

Abstract

AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. The project website can be found at this https URL.

Abstract (translated)

AI agents通常通过训练大量的人类行为数据来进行共同学习。然而,并不是所有的行为都是安全和有用的。期望的AI代理特征可以通过分配吸引力分数来表达,我们假设这些分数不是分配给单个行为的,而是分配给集体轨迹。例如,在车辆互动数据集中,这些分数可能与发生的事故数量有关。首先,我们评估每个单独代理行为对集体吸引力分数的影响,例如评估一个代理引起事故的可能性。这使我们能够选择性地模仿具有积极影响效应的代理,例如,只模仿不太可能引起事故的代理。为了实现这一目标,我们提出了代理的交换价值概念,该概念衡量了一个代理对集体吸引力分数的贡献。交换价值是用随机选择一个代理来代替该代理时,期望的吸引力分数的变化。我们还提出了从现实世界数据中估计交换价值的方法,使我们能够学习具有优异表现的相关基线之外的需求模仿策略。项目网站可以在此处找到:https://www.project-url.com/

URL

https://arxiv.org/abs/2405.03735

PDF

https://arxiv.org/pdf/2405.03735.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot