DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification

2020-07-25 06:02:20

Jingqiao Zhao, Zhen-Hua Feng, Qiuqiang Kong, Xiaoning Song, Xiao-Jun Wu

arXiv_SD

arXiv_SD CNN Detection Classification Pose Scene_Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper presents a Depthwise Disout Convolutional Neural Network (DD-CNN) for the detection and classification of urban acoustic scenes. Specifically, we use log-mel as feature representations of acoustic signals for the inputs of our network. In the proposed DD-CNN, depthwise separable convolution is used to reduce the network complexity. Besides, SpecAugment and Disout are used for further performance boosting. Experimental results demonstrate that our DD-CNN can learn discriminative acoustic characteristics from audio fragments and effectively reduce the network complexity. Our DD-CNN was used for the low-complexity acoustic scene classification task of the DCASE2020 Challenge, which achieves 92.04% accuracy on the validation set.

Abstract (translated)

URL

https://arxiv.org/abs/2007.12864

PDF

https://arxiv.org/pdf/2007.12864.pdf