Paper Reading AI Learner

What does the Knowledge Neuron Thesis Have to do with Knowledge?

2024-05-03 18:34:37
Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn

Abstract

We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the language model's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022). We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms.

Abstract (translated)

我们重新评估了知识神经元(KN)假说:关于大型语言模型从训练语料库中召回事实的机制。这个正在萌芽的假说提出了通过MLP权重从训练语料库中召回事实的方式,类似于关键值记忆,暗示事实上存储在网络中。此外,通过修改MLP模块,一个人可以控制语言模型的事实信息的生成。KN假说的可信度已经被KN灵感模型编辑方法的成功(Dai等人,2022;Meng等人,2022)所证明。我们发现,这个假说充其量只是一个简化。我们不仅发现我们可以使用相同的模型编辑方法编辑某些语言现象的表达,而且,通过更全面的评估,我们发现KN假说不足以解释事实表达的过程。虽然可以认为MLP权重存储可解释为语法和语义复杂模式的复杂模式,但这些模式不构成“知识”。为了获得对知识表示过程的更全面理解,我们必须超越MLP权重,探索最近模型的复杂层结构和注意机制。

URL

https://arxiv.org/abs/2405.02421

PDF

https://arxiv.org/pdf/2405.02421.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot