Paper Reading AI Learner

AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning

2019-05-27 14:27:13
Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu

Abstract

Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: 1) Many DRL-based policies are not sample-efficient. 2) Most models don't have the capability of policy transfer between different domains. In this paper, we propose a universal framework, AgentGraph, to tackle these two problems. The proposed AgentGraph is the combination of GNN-based architecture and DRL-based algorithm. It can be regarded as one of the multi-agent reinforcement learning approaches. Each agent corresponds to a node in a graph, which is defined according to the dialogue domain ontology. When making a decision, each agent can communicate with its neighbors on the graph. Under AgentGraph framework, we further propose Dual GNN-based dialogue policy, which implicitly decomposes the decision in each turn into a high-level global decision and a low-level local decision. Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark. Moreover, when transferred from the source task to a target task, these models not only have acceptable initial performance but also converge much faster on the target task.

Abstract (translated)

对话政策在任务导向的对话系统中发挥着重要作用。它决定如何响应用户。最近提出的深度强化学习(DRL)方法已被用于政策优化。然而,这些深层次模型仍然具有挑战性,原因有两个:1)许多基于DRL的策略都不具有样本效率。2)大多数模型不具备不同域之间的策略转移能力。本文提出了解决这两个问题的通用框架agentgraph。所提出的agentgraph是基于GNN的体系结构和基于DRL的算法的结合。它可以看作是一种多智能体强化学习方法。每个代理对应于图中的一个节点,该节点是根据对话域本体定义的。当做出决定时,每个代理都可以在图上与其邻居通信。在agentgraph框架下,我们进一步提出了基于GNN的双重对话策略,该策略隐含地将决策分解为高层次的全局决策和低层次的局部决策。实验表明,agentgraph模型在pydial基准测试的18个任务中的大部分都显著优于传统的强化学习方法。此外,当从源任务转移到目标任务时,这些模型不仅具有可接受的初始性能,而且在目标任务上的收敛速度更快。

URL

https://arxiv.org/abs/1905.11259

PDF

https://arxiv.org/pdf/1905.11259.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot