Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

Abstract
Abstract (translated)
URL
PDF

Abstract

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.

Abstract (translated)

虽然英语在回答句子选择任务(AS2)方面取得了令人印象深刻的表现，但对于缺乏大型标注数据的语言来说，情况并不是这样。在本文中，我们提出了跨语言知识蒸馏(CLKD)方法，从一位强大的英语AS2老师那里提出，用于训练低资源语言中的AS2模型，而不需要目标语言的标注数据。为了评估我们的方法，我们介绍了1) Xtr-WikiQA，一个基于翻译的WikiQA数据集，适用于9个其他语言，以及2) TyDi-AS2，一个跨越多个语言类型的AS2数据集，超过7000个问题，涵盖了8个类型不同的语言。我们使用多个老师、多种语言的预训练语言模型(PLM)作为学生，进行双语和多语的训练。结果证明，CLKD在同样数量的标注数据和机器翻译与教师模型的结合下，优于或甚至与 supervised fine-tuning 相等。我们的方法可以潜力地支持低资源语言中的更强AS2模型，而 TyDi-AS2可以作为 research community 中最大的跨语言AS2数据集进行进一步研究。

URL

https://arxiv.org/abs/2305.16302

PDF

https://arxiv.org/pdf/2305.16302.pdf