Benchmarking Representations for Speech, Music, and Acoustic Events

2024-05-02 01:24:53

Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.

Abstract (translated)

标准基准测试对评估音频表示学习（ARL）方法的有限多样性可能会阻碍当前方法的系统比较能力。我们提出了ARCH（音频分类域全面基准），一个用于评估各种音频分类域中ARL方法的全面基准，包括音频事件、音乐和语音。ARCH包括12个数据集，使我们能够深入评估不同大小的预训练SSL模型的性能。ARCH通过其广泛的领域访问权限和容易纳入新数据集和模型的能力，简化了ARL技术的基准测试。为了解决当前缺乏非语音音频的开放源代码预训练模型的问题，我们还发布了在非语音数据集上表现出强劲性能的新预训练模型。我们认为，所提出的广泛的评估为最先进的ARL方法提供了宝贵的见解，有助于确定有前途的研究方向。

URL

https://arxiv.org/abs/2405.00934

PDF

https://arxiv.org/pdf/2405.00934.pdf

Benchmarking Representations for Speech, Music, and Acoustic Events

Abstract

Abstract (translated)

URL

PDF Copy

PDF