Abstract
We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.
Abstract (translated)
我们向您介绍最新的 MobileNets generation,被称为 MobileNetV4(MNv4),它具有移动设备上普遍高效的架构设计。在其核心,我们引入了统一且灵活的结构 - 逆瓶颈(UIB)搜索块,将逆瓶颈(IB)、ConvNext、Feed Forward Network(FFN)和新颖的 Extra Depthwise(ExtraDW)结合在一起。与 UIB 一起,我们还介绍了 Mobile MQA,一种专为移动加速器设计的关注块,实现了显著的 39% 的速度提升。此外,还引入了优化的神经架构搜索(NAS)食谱,提高了 MNv4 的搜索效果。将 UIB、移动 MQA 和优化的 NAS 食谱相结合,产生了在移动 CPUs、DSPs、GPUs 和专用加速器(如苹果 Neural Engine 和谷歌 Pixel EdgeTPU)上普遍最优的 MNv4 模型 - 这是其他模型中没有发现的特征。最后,为了进一步提高准确性,我们引入了一种新的蒸馏技术。通过这种技术,我们的 MNv4-Hybrid-Large 模型在 ImageNet-1K 上的准确率达到了 87%,Pixel 8 EdgeTPU 运行时间仅为 3.8ms。
URL
https://arxiv.org/abs/2404.10518