Self-supervised visual learning in the low-data regime: a comparative evaluation

Abstract
Abstract (translated)
URL
PDF

Abstract

Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploiting supervised transfer learning. Despite the relatively straightforward conceptualization and applicability of SSL, it is not always feasible to collect and/or to utilize very large pretraining datasets, especially when it comes to real-world application settings. In particular, in cases of specialized and domain-specific application scenarios, it may not be achievable or practical to assemble a relevant image pretraining dataset in the order of millions of instances or it could be computationally infeasible to pretrain at this scale. This motivates an investigation on the effectiveness of common SSL pretext tasks, when the pretraining dataset is of relatively limited/constrained size. In this context, this work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches, and, subsequently, conducts a thorough comparative experimental evaluation in the low-data regime, targeting to identify: a) what is learnt via low-data SSL pretraining, and b) how do different SSL categories behave in such training scenarios. Interestingly, for domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining on general datasets. Grounded on the obtained results, valuable insights are highlighted regarding the performance of each category of SSL methods, which in turn suggest straightforward future research directions in the field.

Abstract (translated)

自监督学习（SSL）是一种现代深度神经网络（DNN）的有价值且鲁棒的教学方法，允许在不需要真实标签/注释的情况下进行无监督预训练。这使得从大量的未标注训练数据中进行有效的表示学习成为可能，从而在下游任务上提高准确性，通过利用监督迁移学习。尽管 SSL 的概念化和应用非常简单，但在实际应用场景中收集和/或利用非常大的预训练数据集通常是不可行的或不太实际的。特别是在专业和领域特定的应用场景中，可能无法按百万实例的顺序组装相关的图像预训练数据集，或者在当前规模上进行预训练可能具有计算上的可行性。因此，进行研究来评估 SSL 预训练任务的效力就显得尤为重要。在数据量有限/受约束的情况下，这项工作引入了一个现代视觉 SSL 方法的分类学，同时对主要方法类别进行了详细解释和洞察，随后在低数据量的情况下进行了全面的比较实验，旨在确定：a）低数据量 SSL 预训练过程中学到的知识；b）不同 SSL 类别在训练场景中的行为。有趣的是，在领域特定的下游任务中，基于领域的低数据量 SSL 预训练超过了通用数据集的大型预训练方法。根据所得到的结果，对每个 SSL 方法类别的性能进行了突出，这进而提出了该领域未来研究的明确方向。

URL

https://arxiv.org/abs/2404.17202

PDF

https://arxiv.org/pdf/2404.17202.pdf

Self-supervised visual learning in the low-data regime: a comparative evaluation

Abstract

Abstract (translated)

URL

PDF Copy

PDF