Paper Reading AI Learner

Understanding Finetuning for Factual Knowledge Extraction from Language Models

2023-01-26 18:29:50
Mehran Kazemi, Sid Mittal, Deepak Ramachandran

Abstract

Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph construction where, instead of manual curation or text mining, one extracts knowledge from the parameters of an LM. Recently, it has been shown that finetuning LMs on a set of factual knowledge makes them produce better answers to queries from a different set, thus making finetuned LMs a good candidate for knowledge extraction and, consequently, knowledge graph construction. In this paper, we analyze finetuned LMs for factual knowledge extraction. We show that along with its previously known positive effects, finetuning also leads to a (potentially harmful) phenomenon which we call Frequency Shock, where at the test time the model over-predicts rare entities that appear in the training set and under-predicts common entities that do not appear in the training set enough times. We show that Frequency Shock leads to a degradation in the predictions of the model and beyond a point, the harm from Frequency Shock can even outweigh the positive effects of finetuning, making finetuning harmful overall. We then consider two solutions to remedy the identified negative effect: 1- model mixing and 2- mixture finetuning with the LM's pre-training task. The two solutions combined lead to significant improvements compared to vanilla finetuning.

Abstract (translated)

语言模型(LMs)在从互联网上收集的大量文本中预先训练后发现,其中包含了对世界的各种类型知识的大量知识。这一观察导致知识图谱构建中出现了一种新的、令人兴奋的场景,其中,不再使用手动编辑或文本挖掘,而是从LM参数中提取知识。最近,研究表明,对一组事实知识的微调可以让LM从另一个组中产生更好的查询回答,从而使微调后的LM成为知识提取和知识图谱构建的好候选。在本文中,我们对微调后的知识提取LM进行分析。我们表明,除了已知的积极效应外,微调还导致一种(可能有害的)现象,我们称之为频率 shock,即测试时模型过度预测出现在训练集中的罕见的实体,而欠预测出现在训练集中多次出现的常见的实体。我们表明,频率 shock导致模型预测值下降,而且超过一个阈值,频率 shock的危害甚至会超过微调的积极效应,从而使微调总体上有害。然后我们考虑了两个解决方案,以消除所识别的消极影响:1-模型混合,2-与LM的预训练任务混合的微调。这两种解决方案与纯微调相比,都带来了显著的改进。

URL

https://arxiv.org/abs/2301.11293

PDF

https://arxiv.org/pdf/2301.11293.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot