Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?

2023-01-27 12:56:29

Jakub Hościłowicz, Marcin Sowański, Piotr Czubowski, Artur Janicki

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.

Abstract (translated)

本文使用探索性测试(probe)来研究在BERT基于自然语言理解(NLU)模型的微调和知识蒸馏期间发生的现象。我们的终极目的是使用探索性测试更好地理解实际生产问题,并因此构建更好的NLU模型。我们设计了实验来观察微调如何改变BERT的语言学能力,确定微调数据集的最佳大小,以及基于小Transformer的知识蒸馏NLU包含的信息量。实验结果表明,当前的探索性测试范式不适合回答这些问题。结构、边缘和条件探索没有考虑如何解码探索性信息。因此,我们得出结论,量化信息解码能力对于许多探索性测试范式的实际应用至关重要。

URL

https://arxiv.org/abs/2301.11688

PDF

https://arxiv.org/pdf/2301.11688.pdf