Abstract
Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene ``office'', the acoustic events ``mouse clicking'' and ``keyboard typing'' are likely to occur. In this paper, we propose multitask learning for joint analysis of acoustic events and scenes, which shares the parts of the networks holding information on acoustic events and scenes in common. By integrating the two networks, we expect that information on acoustic scenes will improve the performance of acoustic event detection. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of acoustic event detection by 10.66 percentage points in terms of the F-score, compared with a conventional method based on a convolutional recurrent neural network.
Abstract (translated)
声事件检测和场景分类是环境声分析的主要研究任务,提出了许多基于神经网络的方法。传统的方法已经分别处理了这些任务;然而,声学事件和场景是密切相关的。例如,在声学场景“office”中,可能会发生声学事件“鼠标单击”和“键盘键入”。本文提出了一种多任务学习方法,用于声事件和场景的联合分析,该方法共享网络中包含声事件和场景信息的部分。通过两个网络的集成,我们期望声学场景的信息能够提高声学事件检测的性能。利用2016/2017年TUT声音事件和2016年TUT声学场景数据集获得的实验结果表明,与基于卷积循环神经网络的传统方法相比,该方法提高了10.66个百分点的声学事件检测性能。
URL
https://arxiv.org/abs/1904.12146