NASiam: Efficient Representation Learning using Neural Architecture Search for Siamese Networks

Abstract
Abstract (translated)
URL
PDF

Abstract

Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL). Since hand labeling is costly, SSL can play a crucial part by allowing deep learning to train on large unlabeled datasets. Meanwhile, Neural Architecture Search (NAS) is becoming increasingly important as a technique to discover novel deep learning architectures. However, early NAS methods based on reinforcement learning or evolutionary algorithms suffered from ludicrous computational and memory costs. In contrast, differentiable NAS, a gradient-based approach, has the advantage of being much more efficient and has thus retained most of the attention in the past few years. In this article, we present NASiam, a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) architectures inside siamese-networks-based contrastive learning frameworks (e.g., SimCLR, SimSiam, and MoCo) while preserving the simplicity of previous baselines. We crafted a search space designed explicitly for multilayer perceptrons, inside which we explored several alternatives to the standard ReLU activation function. We show that these new architectures allow ResNet backbone convolutional models to learn strong representations efficiently. NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours. We discuss the composition of the NAS-discovered architectures and emit hypotheses on why they manage to prevent collapsing behavior. Our code is available at this https URL.

Abstract (translated)

西塔 Networks 是实现自监督视觉表示学习(SSL)趋势之一,因为手标记的成本很高,SSL 可以通过允许深度学习在大量未标记数据集上训练来实现关键作用。同时,神经网络结构搜索(NAS)变得越来越重要,作为一种技术,以发现新的深度学习架构。然而,早期基于强化学习和进化算法的NAS方法面临着荒谬的计算和内存成本。相比之下,变分自编码器(VAE)是一种基于梯度的方法,具有更大的优势,因此在过去几年中吸引了大部分关注。在本文中,我们介绍了NASiam,一种新方法,首次使用变分自编码器来改进基于西塔 Networks 的对比度学习框架中的多层感知器投影和预测器(编码/预测器对)架构,同时保持先前基线的简单性。我们创造了一个专门设计的多层感知器搜索空间,其中我们在其中探索了标准ReLU激活函数的几个替代品。我们表明,这些新架构使ResNet基线卷积模型能够高效学习强表示。NASiam在小型(例如CIFAR-10/100)和大型(例如ImageNet)图像分类数据集上实现了竞争性能,同时只花费几个GPU小时的成本, while 仅耗费几个GPU小时。我们讨论了NAS发现的结构组成,并提出了它们如何防止崩溃行为的原因的假设。我们的代码在此httpsURL上可用。

URL

https://arxiv.org/abs/2302.00059

PDF

https://arxiv.org/pdf/2302.00059.pdf