Multilingual and cross-lingual document classification: A meta-learning approach

2021-01-27 10:22:56

Niels van der Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

The great majority of languages in the world are considered under-resourced for the successful application of deep learning methods. In this work, we propose a meta-learning approach to document classification in limited-resource setting and demonstrate its effectiveness in two different settings: few-shot, cross-lingual adaptation to previously unseen languages; and multilingual joint training when limited target-language data is available during training. We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability and show that meta-learning thrives in settings with a heterogeneous task distribution. We propose a simple, yet effective adjustment to existing meta-learning methods which allows for better and more stable learning, and set a new state of the art on several languages while performing on-par on others, using only a small amount of labeled data.

Abstract (translated)

URL

https://arxiv.org/abs/2101.11302

PDF

https://arxiv.org/pdf/2101.11302.pdf