Paper Reading AI Learner

Rethinking The Uniformity Metric in Self-Supervised Learning

2024-03-01 16:22:05
Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang

Abstract

Uniformity plays a crucial role in the assessment of learned representations, contributing to a deeper comprehension of self-supervised learning. The seminal work by \citet{Wang2020UnderstandingCR} introduced a uniformity metric that quantitatively measures the collapse degree of learned representations. Directly optimizing this metric together with alignment proves to be effective in preventing constant collapse. However, we present both theoretical and empirical evidence revealing that this metric lacks sensitivity to dimensional collapse, highlighting its limitations. To address this limitation and design a more effective uniformity metric, this paper identifies five fundamental properties, some of which the existing uniformity metric fails to meet. We subsequently introduce a novel uniformity metric that satisfies all of these desiderata and exhibits sensitivity to dimensional collapse. When applied as an auxiliary loss in various established self-supervised methods, our proposed uniformity metric consistently enhances their performance in downstream tasks.Our code was released at this https URL.

Abstract (translated)

在对学习表示的评估中,统一性扮演着关键角色,有助于对自监督学习的更深刻理解。Wang等人(2020)在《Understanding CR》中引入了一个统一性度量,用于量化学习表示的衰减程度。直接优化这个度量并与对齐一起证明在防止恒定衰减方面是有效的。然而,我们提供了理论和实证证据,表明这个度量对维度衰减缺乏敏感性,揭示了其局限性。为了应对这个局限性并设计一个更有效的统一性度量,本文确定了五个基本属性,其中一些是现有统一性度量未能满足的。我们随后引入了一个满足所有这些需求的全新统一性度量,并展示了其对维度衰减的敏感性。在各种已有的自监督方法中,我们将该统一性度量作为一种辅助损失进行应用,结果表明,在下游任务中,我们提出的统一性度量显著增强了它们的性能。我们的代码发布在以下这个链接上:

URL

https://arxiv.org/abs/2403.00642

PDF

https://arxiv.org/pdf/2403.00642.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot