Abstract
Self-supervised learning (SSL) has emerged as a crucial technique in image processing, encoding, and understanding, especially for developing today's vision foundation models that utilize large-scale datasets without annotations to enhance various downstream tasks. This study introduces a novel SSL approach, Information-Maximized Soft Variable Discretization (IMSVD), for image representation learning. Specifically, IMSVD softly discretizes each variable in the latent space, enabling the estimation of their probability distributions over training batches and allowing the learning process to be directly guided by information measures. Motivated by the MultiView assumption, we propose an information-theoretic objective function to learn transform-invariant, non-travail, and redundancy-minimized representation features. We then derive a joint-cross entropy loss function for self-supervised image representation learning, which theoretically enjoys superiority over the existing methods in reducing feature redundancy. Notably, our non-contrastive IMSVD method statistically performs contrastive learning. Extensive experimental results demonstrate the effectiveness of IMSVD on various downstream tasks in terms of both accuracy and efficiency. Thanks to our variable discretization, the embedding features optimized by IMSVD offer unique explainability at the variable level. IMSVD has the potential to be adapted to other learning paradigms. Our code is publicly available at this https URL.
Abstract (translated)
自监督学习(Self-supervised Learning,SSL)已成为图像处理、编码和理解领域中的关键技术,特别是在开发当今利用大规模无标注数据集来增强各种下游任务的视觉基础模型方面。本研究提出了一种新颖的SSL方法——信息最大化软变量离散化(Information-Maximized Soft Variable Discretization, IMSVD),用于图像表示学习。具体来说,IMSVD在隐空间中对每个变量进行软性离散化处理,这使得能够估计训练批次中的概率分布,并允许通过信息度量直接指导学习过程。 受多视图假设的启发,我们提出了一种基于信息理论的目标函数来学习变换不变、非冗余且最小冗余表示特征。随后,我们推导出一种联合交叉熵损失函数用于自监督图像表示学习,在理论上比现有方法更能减少特征冗余度。值得注意的是,我们的非对比式IMSVD方法在统计上能够实现对比学习的效果。 广泛的实验结果证明了IMSVD在各种下游任务中的有效性和高效性,无论是在准确性还是效率方面均表现出色。由于我们对变量进行离散化处理,通过IMSVD优化的嵌入特征提供了独特的可解释性,特别是在变量层面。此外,IMSVD具有适应其他学习范式的潜力。 我们的代码已公开发布在 [此处](https://this https URL)(请将URL替换为实际链接)。
URL
https://arxiv.org/abs/2501.03469