Abstract
Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: 1) Many DRL-based policies are not sample-efficient. 2) Most models don't have the capability of policy transfer between different domains. In this paper, we propose a universal framework, AgentGraph, to tackle these two problems. The proposed AgentGraph is the combination of GNN-based architecture and DRL-based algorithm. It can be regarded as one of the multi-agent reinforcement learning approaches. Each agent corresponds to a node in a graph, which is defined according to the dialogue domain ontology. When making a decision, each agent can communicate with its neighbors on the graph. Under AgentGraph framework, we further propose Dual GNN-based dialogue policy, which implicitly decomposes the decision in each turn into a high-level global decision and a low-level local decision. Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark. Moreover, when transferred from the source task to a target task, these models not only have acceptable initial performance but also converge much faster on the target task.
Abstract (translated)
对话政策在任务导向的对话系统中发挥着重要作用。它决定如何响应用户。最近提出的深度强化学习(DRL)方法已被用于政策优化。然而,这些深层次模型仍然具有挑战性,原因有两个:1)许多基于DRL的策略都不具有样本效率。2)大多数模型不具备不同域之间的策略转移能力。本文提出了解决这两个问题的通用框架agentgraph。所提出的agentgraph是基于GNN的体系结构和基于DRL的算法的结合。它可以看作是一种多智能体强化学习方法。每个代理对应于图中的一个节点,该节点是根据对话域本体定义的。当做出决定时,每个代理都可以在图上与其邻居通信。在agentgraph框架下,我们进一步提出了基于GNN的双重对话策略,该策略隐含地将决策分解为高层次的全局决策和低层次的局部决策。实验表明,agentgraph模型在pydial基准测试的18个任务中的大部分都显著优于传统的强化学习方法。此外,当从源任务转移到目标任务时,这些模型不仅具有可接受的初始性能,而且在目标任务上的收敛速度更快。
URL
https://arxiv.org/abs/1905.11259