Paper Reading AI Learner

FlatNAS: optimizing Flatness in Neural Architecture Search for Out-of-Distribution Robustness

2024-02-29 12:33:14
Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Abstract

Neural Architecture Search (NAS) paves the way for the automatic definition of Neural Network (NN) architectures, attracting increasing research attention and offering solutions in various scenarios. This study introduces a novel NAS solution, called Flat Neural Architecture Search (FlatNAS), which explores the interplay between a novel figure of merit based on robustness to weight perturbations and single NN optimization with Sharpness-Aware Minimization (SAM). FlatNAS is the first work in the literature to systematically explore flat regions in the loss landscape of NNs in a NAS procedure, while jointly optimizing their performance on in-distribution data, their out-of-distribution (OOD) robustness, and constraining the number of parameters in their architecture. Differently from current studies primarily concentrating on OOD algorithms, FlatNAS successfully evaluates the impact of NN architectures on OOD robustness, a crucial aspect in real-world applications of machine and deep learning. FlatNAS achieves a good trade-off between performance, OOD generalization, and the number of parameters, by using only in-distribution data in the NAS exploration. The OOD robustness of the NAS-designed models is evaluated by focusing on robustness to input data corruptions, using popular benchmark datasets in the literature.

Abstract (translated)

Neural Architecture Search (NAS) 为神经网络 (NN) 架构的自动定义铺平了道路,吸引了越来越多的研究关注,并为各种场景提供了解决方案。本研究介绍了一种新颖的 NAS 解决方案,称为平滑神经架构搜索 (FlatNAS),探讨了基于新颖的基于容错性的指标和基于Sharpness-Aware最小化 (SAM) 的单 NN 优化之间的相互作用。FlatNAS 是文献中第一个系统地探索 NN 损失函数平面上平局的解决方案,同时也在其分布数据上对其性能和离散数据 (OOD) 鲁棒性进行优化,并限制其架构中参数的数量。与当前研究主要集中于 OOD 算法的研究不同,FlatNAS 成功地评估了 NN 架构对 OOD 鲁棒性的影响,这是机器和深度学习在现实世界应用中至关重要的一个方面。通过仅使用分布数据进行 NAS 探索,FlatNAS 实现了性能、OO 泛化性和参数数量之间的良好平衡。 FlatNAS 对 NAS 设计的模型的 OOD 鲁棒性进行了评估,通过关注输入数据污染的鲁棒性,使用了文献中流行的基准数据集。

URL

https://arxiv.org/abs/2402.19102

PDF

https://arxiv.org/pdf/2402.19102.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot