Paper Reading AI Learner

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behavior and Adversarial Style Sampling for Assistive Tasks

2024-03-01 08:15:18
Tayuki Osa, Tatsuya Harada

Abstract

Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.

Abstract (translated)

残疾人的自主辅助是一个最有前途的自机器人系统的应用之一。最近,在医疗领域使用深度强化学习(RL)进行研究,已经取得了一些鼓舞人心的结果。以前的研究表明,辅助任务可以表示为多智能体强化学习(MAS)中的两个智能体:照顾者和接收者。然而,在多智能体强化学习中训练的政策通常对其他智能体的策略敏感。在这种情况下,经过训练的照顾者的策略可能对不同的接收者无效。为了减轻这个问题,我们提出了一个学习 robust 照顾者策略的框架,通过为不同接收者反应进行训练。在我们的框架中,通过尝试和错误,自适应地学习不同的照顾者策略。此外,为了增强照顾者的策略的稳健性,我们提出了在训练过程中以对抗方式采样接收者反应的策略。我们在 Assistive Gym 等任务中评估了所提出的方法。我们证明了使用流行 deep RL 方法训练的政策容易受到其他智能体策略变化的影响,而提出的框架能提高对这种变化的鲁棒性。

URL

https://arxiv.org/abs/2403.00344

PDF

https://arxiv.org/pdf/2403.00344.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot