Abstract
Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, latency requirements, precision etc. Current benchmarking tools suffer from limitations such as a) being either too granular like DeepBench (or) b) mandate a working implementation that is either framework specific or hardware-architecture specific (or) c) provide only high level benchmark metrics. In this paper, we present NTP (Neural Net Topology Profiler), a sophisticated benchmarking framework, to effectively identify memory and compute signature of an end-to-end topology on multiple hardware architectures, without the need to actually implement the topology in a framework. NTP is tightly integrated with hardware specific benchmark tools to enable exhaustive data collection and analysis. Using NTP, a deep learning researcher can quickly establish baselines needed to understand performance of an end-to-end neural network topology and make high level architectural decisions based on optimization techniques like layer sizing, quantization, pruning etc. Further, integration of NTP with frameworks like Tensorflow, Pytorch, Intel OpenVINO etc. allows for performance comparison along several vectors like a) Comparison of different frameworks on a given hardware b) Comparison of different hardware using a given framework c) Comparison across different heterogeneous hardware configurations for given framework etc. These capabilities empower a researcher to effortlessly make architectural decisions needed for achieving optimized performance on any hardware platform. The paper documents the architectural approach of NTP and demonstrates the capabilities of the tool by benchmarking Mozilla DeepSpeech, a popular Speech Recognition topology.
Abstract (translated)
在给定的硬件平台上,端到端神经网络的性能是其计算和内存签名的一个函数,而该函数又受各种参数的控制,如拓扑大小、使用的原语、使用的框架、批处理策略、延迟要求和精度等。当前的基准测试工具存在一些限制,如a)要么像Deepbench(或)b)那样过于细化,要么要求一个框架特定的或硬件体系结构特定的(或)c)工作实现只提供高级基准度量。在本文中,我们提出了一个复杂的基准测试框架NTP(Neural Net Topology Profiler),它可以有效地识别多个硬件架构上的端到端拓扑的内存和计算签名,而无需在一个框架中实际实现拓扑。NTP与特定于硬件的基准工具紧密集成,以实现详尽的数据收集和分析。使用NTP,深入学习的研究人员可以快速建立了解端到端神经网络拓扑结构性能所需的基线,并基于优化技术(如层大小、量化、修剪等)做出高层架构决策。此外,NTP与TensorFlow、PyTorch、Intel OpenVI等框架的集成不,等等。允许沿着几个向量进行性能比较,例如a)在给定硬件上比较不同的框架b)使用给定框架比较不同的硬件c)在给定框架的不同异构硬件配置之间进行比较等。这些功能使研究人员能够轻松地进行架构在任何硬件平台上实现优化性能所需的实际决策。本文记录了NTP的体系结构方法,并通过比较流行的语音识别拓扑Mozilla Deepspeech,演示了该工具的功能。
URL
https://arxiv.org/abs/1905.09063