Paper Reading AI Learner

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

2023-05-25 17:56:04
Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro Moschitti

Abstract

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.

Abstract (translated)

虽然英语在回答句子选择任务(AS2)方面取得了令人印象深刻的表现,但对于缺乏大型标注数据的语言来说,情况并不是这样。在本文中,我们提出了跨语言知识蒸馏(CLKD)方法,从一位强大的英语AS2老师那里提出,用于训练低资源语言中的AS2模型,而不需要目标语言的标注数据。为了评估我们的方法,我们介绍了1) Xtr-WikiQA,一个基于翻译的WikiQA数据集,适用于9个其他语言,以及2) TyDi-AS2,一个跨越多个语言类型的AS2数据集,超过7000个问题,涵盖了8个类型不同的语言。我们使用多个老师、多种语言的预训练语言模型(PLM)作为学生,进行双语和多语的训练。结果证明,CLKD在同样数量的标注数据和机器翻译与教师模型的结合下,优于或甚至与 supervised fine-tuning 相等。我们的方法可以潜力地支持低资源语言中的更强AS2模型,而 TyDi-AS2可以作为 research community 中最大的跨语言AS2数据集进行进一步研究。

URL

https://arxiv.org/abs/2305.16302

PDF

https://arxiv.org/pdf/2305.16302.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot