Abstract
Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.
Abstract (translated)
声场分类(ASC)在现实生活中具有非常重要的意义。然而,基于深度学习的ASC方法目前还不够轻量级,并且其性能也不能令人满意。为了解决这些问题,我们提出了一个深度可分离分蒸馏网络。首先,网络在时域对离散余弦图进行高-低频分解,从而大大降低计算复杂度,同时保持模型性能。其次,我们专门设计三种轻量级的ASC操作符,包括分离卷积(SC)、正交分离卷积(OSC)和分离部分卷积(SPC)。这些操作符在声场分类任务中具有高度高效的特征提取能力。实验结果表明,与目前流行的深度学习方法相比,所提出的方法实现了9.8%的性能提升,同时具有更小的参数数量和计算复杂度。
URL
https://arxiv.org/abs/2405.03567