Paper Reading AI Learner

Bi-CL: A Reinforcement Learning Framework for Robots Coordination Through Bi-level Optimization

2024-04-23 01:13:33
Zechen Hu, Daigo Shishika, Xuesu Xiao, Xuan Wang

Abstract

In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a centralized training and decentralized execution paradigm. Our bi-level reformulation decomposes the original problem into a reinforcement learning level with reduced action space, and an imitation learning level that gains demonstrations from a global optimizer. Both levels contribute to improved learning efficiency and scalability. We note that robots' incomplete information leads to mismatches between the two levels of learning models. To address this, Bi-CL further integrates an alignment penalty mechanism, aiming to minimize the discrepancy between the two levels without degrading their training efficiency. We introduce a running example to conceptualize the problem formulation and apply Bi-CL to two variations of this example: route-based and graph-based scenarios. Simulation results demonstrate that Bi-CL can learn more efficiently and achieve comparable performance with traditional multi-agent reinforcement learning baselines for multi-robot coordination.

Abstract (translated)

在多机器人系统中,实现协调任务仍然是一个重要的挑战,因为协调行为是相互耦合的,并且每个机器人的全局信息缺乏。为了减轻这些挑战,本文引入了一种新颖的方法——双层协调学习(Bi-CL),该方法利用了集中训练和分布式执行范式中的中央化优化结构。我们的双层归约将原始问题分解为强化学习级别具有减小动作空间的级别和基于全局最优器的模仿学习级别。这两个级别都促进了学习效率和可扩展性的提高。我们注意到,机器人的不完全信息导致了学习模型的两个级别之间的差异。为了应对这个问题,Bi-CL进一步引入了平滑惩罚机制,旨在最小化两个级别之间的差异,同时不降低它们的训练效率。我们引入了一个示例来阐述问题求解方法和应用Bi-CL到两种变体:基于路线和基于图的场景。仿真结果表明,Bi-CL可以学习更有效地,与传统的多机器人协同强化学习基线具有可比较的性能。

URL

https://arxiv.org/abs/2404.14649

PDF

https://arxiv.org/pdf/2404.14649.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot