Kformer: Knowledge Injection in Transformer Feed-Forward Layers

2022-01-15 03:00:27

Yunzhi Yao, Shaohan Huang, Ningyu Zhang, Li Dong, Furu Wei, Huajun Chen

arXiv_CL

arXiv_CL QA Knowledge Language_Model Transformer Pose Medical

Abstract
Abstract (translated)
URL
PDF

Abstract

Knowledge-Enhanced Model have developed a diverse set of techniques for knowledge integration on different knowledge sources. However, most previous work neglect the language model's own ability and simply concatenate external knowledge at the input. Recent work proposed that Feed Forward Network (FFN) in pre-trained language model can be seen as an memory that stored factual knowledge. In this work, we explore the FFN in Transformer and propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer. We empirically find that simply injecting knowledge into FFN can enhance the pre-trained language model's ability and facilitate current knowledge fusion methods. Our results on two benchmarks in the commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE) domains demonstrate that Kformer can utilize external knowledge deeply and achieves absolute improvements in these tasks.

Abstract (translated)

URL

https://arxiv.org/abs/2201.05742

PDF

https://arxiv.org/pdf/2201.05742.pdf