Paper Reading AI Learner

Contrastive Self-Supervised Learning As Neural Manifold Packing

2025-06-16 17:24:31
Guanming Zhang, David J. Heeger, Stefano Martiniani

Abstract

Contrastive self-supervised learning based on point-wise comparisons has been widely studied for vision tasks. In the visual cortex of the brain, neuronal responses to distinct stimulus classes are organized into geometric structures known as neural manifolds. Accurate classification of stimuli can be achieved by effectively separating these manifolds, akin to solving a packing problem. We introduce Contrastive Learning As Manifold Packing (CLAMP), a self-supervised framework that recasts representation learning as a manifold packing problem. CLAMP introduces a loss function inspired by the potential energy of short-range repulsive particle systems, such as those encountered in the physics of simple liquids and jammed packings. In this framework, each class consists of sub-manifolds embedding multiple augmented views of a single image. The sizes and positions of the sub-manifolds are dynamically optimized by following the gradient of a packing loss. This approach yields interpretable dynamics in the embedding space that parallel jamming physics, and introduces geometrically meaningful hyperparameters within the loss function. Under the standard linear evaluation protocol, which freezes the backbone and trains only a linear classifier, CLAMP achieves competitive performance with state-of-the-art self-supervised models. Furthermore, our analysis reveals that neural manifolds corresponding to different categories emerge naturally and are effectively separated in the learned representation space, highlighting the potential of CLAMP to bridge insights from physics, neural science, and machine learning.

Abstract (translated)

基于点对点比较的对比自监督学习方法已被广泛研究用于视觉任务。在大脑的视觉皮层中,不同刺激类别引起的神经元反应被组织成称为神经流形(neural manifolds)的几何结构。通过有效地分离这些流形可以实现准确地分类刺激,类似于解决包装问题的过程。我们引入了一种新的自监督框架——对比学习作为流形打包(CLAMP),它将表示学习重新定义为一个流形打包问题。 CLAMP 引入了一个损失函数,灵感来源于短程排斥粒子系统的势能,例如在简单液体和拥挤包装中的物理现象。在这个框架中,每个类别由包含单个图像的各种增强视图的子流形组成。子流形的大小和位置通过遵循打包损失的梯度动态优化。这种方法产生了与颗粒物质的锁紧物理学相平行的、可解释的动力学,并在损失函数内引入了几何上有意义的超参数。 在标准线性评估协议下,该协议冻结主干网络并仅训练一个线性分类器,在这种情况下,CLAMP 达到了与最先进的自监督模型相当的性能。此外,我们的分析表明,在学习表示空间中,不同类别的神经流形自然地出现并且有效分离,这突显了 CLAMP 将物理学、神经科学和机器学习领域见解结合在一起的潜力。

URL

https://arxiv.org/abs/2506.13717

PDF

https://arxiv.org/pdf/2506.13717.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot