Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

Abstract
Abstract (translated)
URL
PDF

Abstract

Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.

Abstract (translated)

图像分割是计算机视觉中最基本的问题之一，由于其在图像理解和自动驾驶中的广泛应用，因此受到了很多关注。然而，设计有效且高效的分割神经架构是一个劳动密集的过程，可能需要许多专家的人工作尝试。在本文中，我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题，通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力，由于需要高昂的内存开销来维持高分辨率，因此这是不可能的。相反，我们开发了一种多目标多分支超级网络方法，不仅充分利用了高分辨率特征的优势，而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标（如延迟和mIoU）进行优化，可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型，该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索，将高分辨率与效率进行了平衡。大量实验证明，HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问：\url{这个链接}。

URL

https://arxiv.org/abs/2403.10413

PDF

https://arxiv.org/pdf/2403.10413.pdf

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

Abstract

Abstract (translated)

URL

PDF Copy

PDF