Paper Reading AI Learner

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

2024-03-15 15:47:54
Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

Abstract

Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.

Abstract (translated)

图像分割是计算机视觉中最基本的问题之一,由于其在图像理解和自动驾驶中的广泛应用,因此受到了很多关注。然而,设计有效且高效的分割神经架构是一个劳动密集的过程,可能需要许多专家的人工作尝试。在本文中,我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题,通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力,由于需要高昂的内存开销来维持高分辨率,因此这是不可能的。相反,我们开发了一种多目标多分支超级网络方法,不仅充分利用了高分辨率特征的优势,而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标(如延迟和mIoU)进行优化,可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型,该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索,将高分辨率与效率进行了平衡。大量实验证明,HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问:\url{这个链接}。

URL

https://arxiv.org/abs/2403.10413

PDF

https://arxiv.org/pdf/2403.10413.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot