Paper Reading AI Learner

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

2024-06-20 16:41:09
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu


Analyzing the similarity of internal representations within and across different models has been an important technique for understanding the behavior of deep neural networks. Most existing methods for analyzing the similarity between representations of high dimensions, such as those based on Canonical Correlation Analysis (CCA) and widely used Centered Kernel Alignment (CKA), rely on statistical properties of the representations for a set of data points. In this paper, we focus on transformer models and study the similarity of representations between the hidden layers of individual transformers. In this context, we show that a simple sample-wise cosine similarity metric is capable of capturing the similarity and aligns with the complicated CKA. Our experimental results on common transformers reveal that representations across layers are positively correlated, albeit the similarity decreases when layers are far apart. We then propose an aligned training approach to enhance the similarity between internal representations, with trained models that enjoy the following properties: (1) the last-layer classifier can be directly applied right after any hidden layers, yielding intermediate layer accuracies much higher than those under standard training, (2) the layer-wise accuracies monotonically increase and reveal the minimal depth needed for the given task, (3) when served as multi-exit models, they achieve on-par performance with standard multi-exit architectures which consist of additional classifiers designed for early exiting in shallow layers. To our knowledge, our work is the first to show that one common classifier is sufficient for multi-exit models. We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training.

Abstract (translated)

分析不同模型内部表示的相似性一直是一个重要的技术,以理解深度神经网络的行为。大多数现有方法分析高维度表示之间的相似性,如基于共轭相关分析(CCA)的算法和广泛使用的中心化Kernel Alignment(CKA)等,都依赖于数据点的统计性质。在本文中,我们重点研究了Transformer模型,并研究了隐藏层之间表示的相似性。在这个背景下,我们证明了简单的样本相关余弦相似度度量可以捕捉相似性,并与复杂的CKA相吻合。我们对常见的Transformer模型的实验结果表明,各层表示之间是正相关的,尽管当层之间相距较远时,相似性会降低。然后,我们提出了一个平滑训练方法,以增强内部表示之间的相似性,具有以下特性:(1)在隐藏层之后可以直接应用最后一个隐藏层的分类器,从而实现比标准训练中中间层准确度更高的中间层准确度;(2)层级的准确度单调递增,并揭示了所需的最低深度,以完成给定任务;(3)作为多出口模型时,它们与标准的多出口架构实现平局性能。据我们所知,我们的工作是第一个表明一个共同的分类器对于多出口模型是足够的。我们对视觉和自然语言处理任务进行实验,以证明所提出的平滑训练方法的有效性。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot