Paper Reading AI Learner

GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures

2024-04-24 21:08:21
Sahil Hassan, Michael Inouye, Miguel C. Gonzalez, Ilkin Aliyev, Joshua Mack, Maisha Hafiz, Ali Akoglu

Abstract

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.

Abstract (translated)

开源模拟工具在神经形态应用工程师和硬件架构师探究性能瓶颈和探索设计优化之前提交硅片方面发挥着关键作用。可重构架构神经形态计算(RANC)是一种这样的工具,它允许在统一的生态系统中通过软件模拟和FPGA仿真执行预训练的Spiking神经网络(SNN)模型。RANC已经通过社区灵活且参数化的设计得到了广泛应用,以研究实现瓶颈、调整架构参数或根据应用洞察力修改神经元行为,并研究硬件性能和网络准确性的贸易空间。 在为神经形态计算设计架构时,有数以百万计的配置参数,如每个神经元的权重数量和精度、每个核心的神经元和轴数量、网络拓扑结构和神经元行为。为了加速这些研究并为用户提供更简便的生产设计空间探索,本文我们引入了基于GPU的RANC实现。我们总结了我们的并行方法,并定量了GPU-based tick-accurate仿真在各种用例中的速度提升。我们证明了基于512个神经形态核心的MNIST推理应用程序的串行版本与基于GPU的RANC仿真器之间的速度提升高达780倍。我们相信,RANC生态系统现在为研究探索不同的优化方法加速SNNs和进行更丰富的研究提供了更加可行的方式,通过使快速收敛到优化神经形态架构而努力。

URL

https://arxiv.org/abs/2404.16208

PDF

https://arxiv.org/pdf/2404.16208.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot