SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

2022-10-30 10:36:11

Zhibin Qiu, Mengfan Fu, Yinfeng Yu, LiLi Yin, Fuchun Sun, Hao Huang

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the ``enhance-and-refine'' paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code and enhanced samples are available at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2210.16805

PDF

https://arxiv.org/pdf/2210.16805.pdf