Generating Wikipedia Article Sections from Diverse Data Sources

2020-12-29 19:35:34

Mingda Chen, Sam Wiseman, Kevin Gimpel

arXiv_CL

arXiv_CL Text_Generation

Abstract
Abstract (translated)
URL
PDF

Abstract

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.

Abstract (translated)

URL

https://arxiv.org/abs/2012.14919

PDF

https://arxiv.org/pdf/2012.14919.pdf