Abstract
Semantic parsing over multiple knowledge bases enables a parser to exploit structural similarities of programs across the multiple domains. However, the fundamental challenge lies in obtaining high-quality annotations of (utterance, program) pairs across various domains needed for training such models. To overcome this, we propose a novel framework to build a unified multi-domain enabled semantic parser trained only with weak supervision (denotations). Weakly supervised training is particularly arduous as the program search space grows exponentially in a multi-domain setting. To solve this, we incorporate a multi-policy distillation mechanism in which we first train domain-specific semantic parsers (teachers) using weak supervision in the absence of the ground truth programs, followed by training a single unified parser (student) from the domain specific policies obtained from these teachers. The resultant semantic parser is not only compact but also generalizes better, and generates more accurate programs. It further does not require the user to provide a domain label while querying. On the standard Overnight dataset (containing multiple domains), we demonstrate that the proposed model improves performance by 20% in terms of denotation accuracy in comparison to baseline techniques.
Abstract (translated)
基于多个知识库的语义分析使解析器能够利用跨多个域的程序的结构相似性。然而,最根本的挑战在于如何在训练此类模型所需的各个领域中获得(话语、程序)对的高质量注释。为了克服这个问题,我们提出了一个新的框架来构建一个统一的多域语义解析器,该解析器只接受弱监督(表示)的训练。当程序搜索空间在多域环境中呈指数级增长时,弱监督训练尤其困难。为了解决这一问题,我们引入了一个多策略蒸馏机制,在缺乏基本事实程序的情况下,我们首先使用弱监督训练特定于领域的语义解析器(教师),然后从这些教师获得的特定于领域的策略中训练单个统一解析器(学生)。生成的语义分析器不仅紧凑,而且具有更好的通用性,生成更精确的程序。它还不要求用户在查询时提供域标签。在标准的整夜数据集(包含多个域)上,我们证明了与基线技术相比,所提出的模型在表示精度方面提高了20%。
URL
https://arxiv.org/abs/1906.05062