As good as new. How to successfully recycle English GPT-2 to make models for other languages

Abstract
Abstract (translated)
URL
PDF

Abstract

Large generative language models have been very successful for English, but other languages lag behind due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained language models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings and induce a bilingual lexicon from this alignment. Additionally, we show how to scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch, but on average these sentences are still identifiable as artificial by humans. Based on perplexity scores and human judgements, we find that generated sentences become more realistic with some additional full model finetuning, especially for Dutch. For Italian, we see that they are evaluated on par with sentences generated by a GPT-2 model fully trained from scratch. Our work can be conceived as a blueprint for training GPT-2s for other languages, and we provide a 'recipe' to do so.

Abstract (translated)

URL

https://arxiv.org/abs/2012.05628

PDF

https://arxiv.org/pdf/2012.05628.pdf