Abstract
Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver's policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver's policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver's policy, we propose a strategy for sampling a care-receiver's response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.
Abstract (translated)
残疾人的自主辅助是一个最有前途的自机器人系统的应用之一。最近,在医疗领域使用深度强化学习(RL)进行研究,已经取得了一些鼓舞人心的结果。以前的研究表明,辅助任务可以表示为多智能体强化学习(MAS)中的两个智能体:照顾者和接收者。然而,在多智能体强化学习中训练的政策通常对其他智能体的策略敏感。在这种情况下,经过训练的照顾者的策略可能对不同的接收者无效。为了减轻这个问题,我们提出了一个学习 robust 照顾者策略的框架,通过为不同接收者反应进行训练。在我们的框架中,通过尝试和错误,自适应地学习不同的照顾者策略。此外,为了增强照顾者的策略的稳健性,我们提出了在训练过程中以对抗方式采样接收者反应的策略。我们在 Assistive Gym 等任务中评估了所提出的方法。我们证明了使用流行 deep RL 方法训练的政策容易受到其他智能体策略变化的影响,而提出的框架能提高对这种变化的鲁棒性。
URL
https://arxiv.org/abs/2403.00344