Paper Reading AI Learner

Does It Make Sense to Explain a Black Box With Another Black Box?

2024-04-23 11:40:30
Julien Delaunay, Luis Galárraga, Christine Largouët

Abstract

Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.

Abstract (translated)

尽管反事实解释是解释机器学习黑盒分类器的一种流行方法,但在自然语言处理中并不普遍。大多数方法通过迭代地扰动目标文档,使其与黑盒分类器的预测不同来进行这些解释。我们文献中识别出两种主要的反事实解释方法,即(a)透明方法,通过添加、删除或替换单词来扰动目标,以及(b)不透明方法,将目标文档投影到隐含、不可解释的空间中,然后在那里进行扰动。本文对这两种方法在三个经典的自然语言处理任务上的性能进行了比较研究。我们的实证证据表明,不透明方法对于下游应用(如虚假新闻检测或情感分析)来说可能是一个过度的复杂方法,因为它们增加了一个没有显著性能提升的额外的复杂性层。这些观察结果引发了我们的讨论,并提出了一个值得探讨的问题:是否使用另一个黑盒来解释黑盒是有意义的?

URL

https://arxiv.org/abs/2404.14943

PDF

https://arxiv.org/pdf/2404.14943.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot