BERTGEN: Multi-task Generation through BERT

2021-06-07 10:17:45

Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, Lucia Specia

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively. BERTGEN is auto-regressively trained for language generation tasks, namely image captioning, machine translation and multimodal machine translation, under a multitask setting. With a comprehensive set of evaluations, we show that BERTGEN outperforms many strong baselines across the tasks explored. We also show BERTGEN's ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts. Finally, we conduct ablation studies which demonstrate that BERTGEN substantially benefits from multi-tasking and effectively transfers relevant inductive biases from the pre-trained models.

Abstract (translated)

URL

https://arxiv.org/abs/2106.03484

PDF

https://arxiv.org/pdf/2106.03484.pdf