Paper Reading AI Learner

NASiam: Efficient Representation Learning using Neural Architecture Search for Siamese Networks


Abstract

Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL). Since hand labeling is costly, SSL can play a crucial part by allowing deep learning to train on large unlabeled datasets. Meanwhile, Neural Architecture Search (NAS) is becoming increasingly important as a technique to discover novel deep learning architectures. However, early NAS methods based on reinforcement learning or evolutionary algorithms suffered from ludicrous computational and memory costs. In contrast, differentiable NAS, a gradient-based approach, has the advantage of being much more efficient and has thus retained most of the attention in the past few years. In this article, we present NASiam, a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) architectures inside siamese-networks-based contrastive learning frameworks (e.g., SimCLR, SimSiam, and MoCo) while preserving the simplicity of previous baselines. We crafted a search space designed explicitly for multilayer perceptrons, inside which we explored several alternatives to the standard ReLU activation function. We show that these new architectures allow ResNet backbone convolutional models to learn strong representations efficiently. NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours. We discuss the composition of the NAS-discovered architectures and emit hypotheses on why they manage to prevent collapsing behavior. Our code is available at this https URL.

Abstract (translated)

西塔 Networks 是实现自监督视觉表示学习(SSL)趋势之一,因为手标记的成本很高,SSL 可以通过允许深度学习在大量未标记数据集上训练来实现关键作用。同时,神经网络结构搜索(NAS)变得越来越重要,作为一种技术,以发现新的深度学习架构。然而,早期基于强化学习和进化算法的NAS方法面临着荒谬的计算和内存成本。相比之下,变分自编码器(VAE)是一种基于梯度的方法,具有更大的优势,因此在过去几年中吸引了大部分关注。在本文中,我们介绍了NASiam,一种新方法,首次使用变分自编码器来改进基于西塔 Networks 的对比度学习框架中的多层感知器投影和预测器(编码/预测器对)架构,同时保持先前基线的简单性。我们创造了一个专门设计的多层感知器搜索空间,其中我们在其中探索了标准ReLU激活函数的几个替代品。我们表明,这些新架构使ResNet基线卷积模型能够高效学习强表示。NASiam在小型(例如CIFAR-10/100)和大型(例如ImageNet)图像分类数据集上实现了竞争性能,同时只花费几个GPU小时的成本, while 仅耗费几个GPU小时。我们讨论了NAS发现的结构组成,并提出了它们如何防止崩溃行为的原因的假设。我们的代码在此httpsURL上可用。

URL

https://arxiv.org/abs/2302.00059

PDF

https://arxiv.org/pdf/2302.00059.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot