Paper Reading AI Learner

Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language

2024-04-27 08:53:58
Tsimur Hadeliya, Dariusz Kajtoch

Abstract

We introduce a few-shot benchmark consisting of 7 different classification tasks native to the Polish language. We conducted an empirical comparison with 0 and 16 shots between fine-tuning, linear probing, SetFit, and in-context learning (ICL) using various pre-trained commercial and open-source models. Our findings reveal that ICL achieves the best performance, with commercial models like GPT-3.5 and GPT-4 attaining the best performance. However, there remains a significant 14 percentage points gap between our best few-shot learning score and the performance of HerBERT-large fine-tuned on the entire training dataset. Among the techniques, SetFit emerges as the second-best approach, closely followed by linear probing. We observed the worst and most unstable performance with non-linear head fine-tuning. Results for ICL indicate that continual pre-training of models like Mistral-7b or Llama-2-13b on Polish corpora is beneficial. This is confirmed by the improved performances of Bielik-7b and Trurl-13b, respectively. To further support experiments in few-shot learning for Polish, we are releasing handcrafted templates for the ICL.

Abstract (translated)

我们提出了一个几 shot 基准,由 7 个波兰语特有的分类任务组成。我们使用各种预训练的商业和开源模型进行了比较,包括微调、线性探测、SetFit 和上下文学习(ICL)。我们的研究结果表明,ICL 取得了最佳性能,商业模型如 GPT-3.5 和 GPT-4 也取得了最佳性能。然而,我们的最佳几 shot 学习得分与整个训练数据集上 HerBERT-large 的性能之间存在显著的 14 个百分点差距。在技术方面,SetFit 成为第二好的方法,紧随其后的是线性探测。我们观察到,非线性头微调的性能最差,最不稳定。ICL 的结果表明,持续在波兰语数据集上预训练像 Mistral-7b 或 Llama-2-13b 这样的模型是有益的。这得到了 Bielik-7b 和 Trurl-13b 分别改善性能的证实。为了进一步支持波兰几 shot 学习的实验,我们发布了 ICL 自定义模板。

URL

https://arxiv.org/abs/2404.17832

PDF

https://arxiv.org/pdf/2404.17832.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot