Paper Reading AI Learner

Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning

2023-01-26 15:00:23
Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark Crowley

Abstract

Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.

Abstract (translated)

多Agent reinforcement learning 通常面临样本效率问题,即学习合适的政策需要使用许多数据样本。从外部实验者学习是一种可能的解决方案,可以缓解这个问题。然而,该领域以前的大多数方法都假设有一个实验者。利用多个知识源(即顾问)在环境的不同方面具有专业知识可以极大地加快复杂环境中的学习。本文考虑了多Agent Reinforcement Learning中同时从多个独立顾问学习的问题。该方法利用两层次的 Q 学习架构,并将其从单Agent 设置扩展到多Agent 设置。我们提供了原则算法,在每个状态评估顾问并随后使用顾问指导行动选择。我们还提供了理论收敛和样本复杂性保证。实验中,我们在三个不同的测试平台上验证了我们的方法,并表明我们的算法比基准表现更好,可以有效地整合不同顾问的联合专业知识,并学会忽略不好的建议。

URL

https://arxiv.org/abs/2301.11153

PDF

https://arxiv.org/pdf/2301.11153.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot