Learning to Synthesize Data for Semantic Parsing

2021-04-12 21:24:02

Bailin Wang, Wenpeng Yin, Xi Victoria Lin, Caiming Xiong

arXiv_CL

arXiv_CL Attention Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized data generated from our model can substantially help a semantic parser achieve better compositional and domain generalization.

Abstract (translated)

URL

https://arxiv.org/abs/2104.05827

PDF

https://arxiv.org/pdf/2104.05827.pdf