Abstract
Current audio classification models have small class vocabularies relative to the large number of sound event classes of interest in the real world. Thus, they provide a limited view of the world that may miss important yet unexpected or unknown sound events. To address this issue, open-set audio classification techniques have been developed to detect sound events from unknown classes. Although these methods have been applied to a multi-class context in audio, such as sound scene classification, they have yet to be investigated for polyphonic audio in which sound events overlap, requiring the use of multi-label models. In this study, we establish the problem of multi-label open-set audio classification by creating a dataset with varying unknown class distributions and evaluating baseline approaches built upon existing techniques.
Abstract (translated)
目前,音频分类模型的类别词汇表相对于感兴趣的现实世界中的大量声音事件类别的规模非常小。因此,它们只能提供对现实世界中少量声音事件的有限认识,可能错过重要但意外或未知的声音事件。为了解决这个问题,已经开发了开放标签音频分类技术,以检测未知类别的声音事件。尽管这些方法已经应用于音频中的多类场景,如音频场景分类,但尚未对多声道音频进行调查,需要使用多标签模型。在本研究中,我们通过创建具有不同未知类别分布的音频数据集,并评估基于现有技术的基线方法,建立了多标签开放设置音频分类的问题。
URL
https://arxiv.org/abs/2310.13759