Exploring WavLM on Speech Enhancement

2022-11-18 02:23:16

Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

arXiv_SD

arXiv_SD Speech_Recognition Recognition Pose Self-Supervised Enhancement Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved.

Abstract (translated)

URL

https://arxiv.org/abs/2211.09988

PDF

https://arxiv.org/pdf/2211.09988.pdf