DDS: A new device-degraded speech dataset for speech enhancement

2021-09-16 12:27:15

Haoyu Li, Junichi Yamagishi

arXiv_SD

arXiv_SD Enhancement Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

A large and growing amount of speech content in real-life scenarios is being recorded on common consumer devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six different microphone positions to simulate various signal-to-noise ratio (SNR) and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance.

Abstract (translated)

URL

https://arxiv.org/abs/2109.07931

PDF

https://arxiv.org/pdf/2109.07931.pdf