Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

2019-11-10 16:05:21

Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

arXiv_SD

arXiv_SD GAN Adversarial Pose Enhancement Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones. The objective of this research was to study the automatic generation of high-quality speech from such low-quality device-recorded speech, which could then be applied to many speech-generation tasks. In this paper, we first introduce our new device-recorded speech dataset then propose an improved end-to-end method for automatically transforming the low-quality device-recorded speech into professional high-quality speech. Our method is an extension of a generative adversarial network (GAN)-based speech enhancement model called speech enhancement GAN (SEGAN), and we present two modifications to make model training more robust and stable. Finally, from a large-scale listening test, we show that our method can significantly enhance the quality of device-recorded speech signals.

Abstract (translated)

URL

https://arxiv.org/abs/1911.03952

PDF

https://arxiv.org/pdf/1911.03952.pdf