Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

2021-04-18 10:20:07

Devang Kulshreshtha, Robert Belfer, Iulian Vlad Serban, Siva Reddy

arXiv_AI

arXiv_AI Unsupervised Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we propose a new domain adaptation method called $\textit{back-training}$, a superior alternative to self-training. While self-training results in synthetic training data of the form quality inputs aligned with noisy outputs, back-training results in noisy inputs aligned with quality outputs. Our experimental results on unsupervised domain adaptation of question generation and passage retrieval models from $\textit{Natural Questions}$ domain to the machine learning domain show that back-training outperforms self-training by a large margin: 9.3 BLEU-1 points on generation, and 7.9 accuracy points on top-1 retrieval. We release $\textit{MLQuestions}$, a domain-adaptation dataset for the machine learning domain containing 50K unaligned passages and 35K unaligned questions, and 3K aligned passage and question pairs. Our data and code are available at this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2104.08801

PDF

https://arxiv.org/pdf/2104.08801.pdf