Paper Reading AI Learner

Mechanistic understanding and validation of large AI models with SemanticLens

2025-01-09 17:47:34
Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Abstract

Unlike human-engineered systems such as aeroplanes, where each component's role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the "trust gap" between AI models and traditional engineered systems. We provide code for SemanticLens on this https URL and a demo on this https URL.

Abstract (translated)

与飞机等由人类设计的系统不同,这些系统的每个组件的作用和依赖关系都十分明确,AI模型内部的工作原理仍然很大程度上不透明,这阻碍了验证过程,并且削弱了信任。本文介绍了一种名为SemanticLens的通用神经网络解释方法,该方法将隐藏在各个组成部分(如单个神经元)中的知识映射到一个语义结构化的多模态空间中,例如CLIP等基础模型的空间内。在这个空间中,可以执行一些独特的操作,包括:(i) 文本搜索以识别编码特定概念的神经元;(ii) 系统性地分析和比较模型表示;(iii) 自动标记神经元并解释其功能角色;以及(iv) 审计以验证决策是否符合要求。SemanticLens完全可扩展且无需人工干预,被证明在调试、验证、总结模型知识、使推理与预期保持一致(例如,在恶性黑色素瘤分类中遵循ABCDE规则)等方面非常有效,并能检测到与虚假关联和其相关训练数据绑定的组件。通过实现对组件级别的理解和验证,该方法有助于弥合AI模型和传统工程系统之间的“信任差距”。我们可以在[此链接](https://example.com/code)提供SemanticLens的代码,在[此链接](https://example.com/demo)提供演示。

URL

https://arxiv.org/abs/2501.05398

PDF

https://arxiv.org/pdf/2501.05398.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot