Paper Reading AI Learner

Evaluating Neuron Interpretation Methods of NLP Models

2023-01-30 02:04:35
Yimin Fan, Fahim Dalvi, Nadir Durrani, Hassan Sajjad

Abstract

Neuron Interpretation has gained traction in the field of interpretability, and have provided fine-grained insights into what a model learns and how language knowledge is distributed amongst its different components. However, the lack of evaluation benchmark and metrics have led to siloed progress within these various methods, with very little work comparing them and highlighting their strengths and weaknesses. The reason for this discrepancy is the difficulty of creating ground truth datasets, for example, many neurons within a given model may learn the same phenomena, and hence there may not be one correct answer. Moreover, a learned phenomenon may spread across several neurons that work together -- surfacing these to create a gold standard challenging. In this work, we propose an evaluation framework that measures the compatibility of a neuron analysis method with other methods. We hypothesize that the more compatible a method is with the majority of the methods, the more confident one can be about its performance. We systematically evaluate our proposed framework and present a comparative analysis of a large set of neuron interpretation methods. We make the evaluation framework available to the community. It enables the evaluation of any new method using 20 concepts and across three pre-trained models.The code is released at this https URL

Abstract (translated)

神经元解释在解释性领域的进展已经取得了成功,并提供了深入了解模型学习了什么以及如何将语言知识分布在其不同组件中的精细洞察力。然而,缺乏评估基准和指标导致这些方法之间存在壁,并且几乎没有工作进行比较和突出它们的优点和缺点。导致这种差异的原因是创建真实数据集的困难,例如,在一个给定模型中,许多神经元可能学习相同的现象,因此可能没有一个正确答案。此外,一个学习的现象可能会在几个神经元合作的情况下传播, - 找到这些来创建黄金标准挑战性。在这个工作中,我们提出了一个评估框架,该框架衡量神经元分析方法与其他方法的兼容性。我们假设,一个方法与大多数方法的兼容性越好,就越有信心地评估其性能。我们系统地评估了我们提出的框架,并比较分析了一组神经元解释方法。我们向社区提供了评估框架。它使可以使用20个概念和三个预训练模型评估任何新的方法。代码在这里发布。

URL

https://arxiv.org/abs/2301.12608

PDF

https://arxiv.org/pdf/2301.12608.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot