Paper Reading AI Learner

Neurobench: DCASE 2020 Acoustic Scene Classification benchmark on XyloAudio 2

2024-10-31 09:48:12
Weijie Ke, Mina Khoei, Dylan Muir

Abstract

XyloAudio is a line of ultra-low-power audio inference chips, designed for in- and near-microphone analysis of audio in real-time energy-constrained scenarios. Xylo is designed around a highly efficient integer-logic processor which simulates parameter- and activity-sparse spiking neural networks (SNNs) using a leaky integrate-and-fire (LIF) neuron model. Neurons on Xylo are quantised integer devices operating in synchronous digital CMOS, with neuron and synapse state quantised to 16 bit, and weight parameters quantised to 8 bit. Xylo is tailored for real-time streaming operation, as opposed to accelerated-time operation in the case of an inference accelerator. XyloAudio includes a low-power audio encoding interface for direct connection to a microphone, designed for sparse encoding of incident audio for further processing by the inference core. In this report we present the results of DCASE 2020 acoustic scene classification audio benchmark dataset deployed to XyloAudio 2. We describe the benchmark dataset; the audio preprocessing approach; and the network architecture and training approach. We present the performance of the trained model, and the results of power and latency measurements performed on the XyloAudio 2 development kit. This benchmark is conducted as part of the Neurobench project.

Abstract (translated)

XyloAudio 是一款超低功耗音频推理芯片系列,专为实时能量受限场景下的麦克风内和近麦克风音频分析设计。Xylo围绕一个高效的整数逻辑处理器构建,该处理器使用泄漏积分放电(LIF)神经元模型模拟参数稀疏和活动稀疏的脉冲神经网络(SNN)。Xylo上的神经元是量化的整数设备,在同步数字CMOS中运行,其中神经元和突触状态量化为16位,权重参数量化为8位。Xylo专为实时流媒体操作设计,而不是像推理加速器那样进行加速时间操作。XyloAudio 包含一个低功耗音频编码接口,可以直接连接到麦克风,用于稀疏编码传入的音频以便进一步由推理核心处理。 在这份报告中,我们展示了将DCASE 2020声景分类音频基准数据集部署到XyloAudio 2上的结果。我们描述了基准数据集;音频预处理方法;以及网络架构和训练方法。我们还呈现了训练模型的性能,并展示了在XyloAudio 2开发套件上进行的功耗和延迟测量的结果。这项基准测试是作为Neurobench项目的一部分而进行的。

URL

https://arxiv.org/abs/2410.23776

PDF

https://arxiv.org/pdf/2410.23776.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot