Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?

2018-11-02 21:22:58

Huy Phan, Oliver Y. Chén, Philipp Koch, Lam Pham, Ian McLoughlin, Alfred Mertins, Maarten De Vos

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Due to the variability in characteristics of audio scenes, some can naturally be recognized earlier, i.e. after a shorter duration, than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized. For modelling, in addition to two single-network systems relying on a convolutional neural network and a recurrent neural network, we also investigate early fusion and late fusion of these two single networks for audio scene classification. Moreover, as model fusion is prevalent in audio scene classifiers, we further aim to study whether and when model fusion is really necessary for this task.

Abstract (translated)

URL

https://arxiv.org/abs/1811.01095

PDF

https://arxiv.org/pdf/1811.01095.pdf

Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?

Abstract

Abstract (translated)

URL

PDF Copy

PDF