Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

Abstract
Abstract (translated)
URL
PDF

Abstract

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.

Abstract (translated)

随着大规模深度神经网络需求的最近增长，计算内存（CiM）作为一种解决带宽和芯片间互连瓶颈的有希望的解决方案，受到了关注。然而，CiM硬件构建面临着挑战，因为不同接口上的具体内存层次结构可能与神经网络的属性（如张量维度和算术强度）理想上不匹配，从而导致亚优和低效的系统。尽管神经架构搜索（NAS）技术在为给定硬件指标预算产生高效的子网络方面取得了成功（例如，DNN执行时间或延迟），但它假设硬件配置为固定，往往导致为给定预算产生亚优的子网络。在本文中，我们提出了CiMNet，一个用于共同搜索最优子网络和硬件配置的CiM架构的框架，为创建帕累托最优前沿的下游任务准确性和执行指标（例如延迟）奠定了基础。对CNN和Transformer家族中的不同模型架构进行充分的实验证明，CiMNet在找到共同优化的子网络和CiM硬件配置方面非常有效。具体来说，与基线ViT-B相同的ImageNet分类准确性只优化模型架构可以增加性能（或减少工作负载执行时间）1.7倍，同时优化模型架构和硬件配置可以增加其3.1倍。

URL

https://arxiv.org/abs/2402.11780

PDF

https://arxiv.org/pdf/2402.11780.pdf

Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

Abstract

Abstract (translated)

URL

PDF Copy

PDF