Feature Structure Distillation for BERT Transferring

2022-04-01 10:10:27

Hee-Jun Jung, Doyeon Kim, Seung-Hoon Na, Kangil Kim

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in BERT transferring, we investigate distillation of structures of representations specified to three types: intra-feature, local inter-feature, global inter-feature structures. To transfer them, we introduce \textit{feature structure distillation} methods based on the Centered Kernel Alignment, which assigns a consistent value to similar features structures and reveals more informative relations. In particular, a memory-augmented transfer method with clustering is implemented for the global structures. In the experiments on the nine tasks for language understanding of the GLUE dataset, the proposed methods effectively transfer the three types of structures and improve performance compared to state-of-the-art distillation methods. Indeed, the code for the methods is available in this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2204.08922

PDF

https://arxiv.org/pdf/2204.08922.pdf

Feature Structure Distillation for BERT Transferring

Abstract

Abstract (translated)

URL

PDF Copy

PDF