Abstract
We propose SE-Bridge, a novel method for speech enhancement (SE). After recently applying the diffusion models to speech enhancement, we can achieve speech enhancement by solving a stochastic differential equation (SDE). Each SDE corresponds to a probabilistic flow ordinary differential equation (PF-ODE), and the trajectory of the PF-ODE solution consists of the speech states at different moments. Our approach is based on consistency model that ensure any speech states on the same PF-ODE trajectory, correspond to the same initial state. By integrating the Brownian Bridge process, the model is able to generate high-intelligibility speech samples without adversarial training. This is the first attempt that applies the consistency models to SE task, achieving state-of-the-art results in several metrics while saving 15 x the time required for sampling compared to the diffusion-based baseline. Our experiments on multiple datasets demonstrate the effectiveness of SE-Bridge in SE. Furthermore, we show through extensive experiments on downstream tasks, including Automatic Speech Recognition (ASR) and Speaker Verification (SV), that SE-Bridge can effectively support multiple downstream tasks.
Abstract (translated)
我们提出SE-Bridge,一种用于语音增强的新颖方法(SE)。最近,我们应用扩散模型对语音增强进行了尝试,我们可以通过解决一宗随机微分方程(SDE)来实现语音增强。每个SDE对应着一宗概率流普通微分方程(PF-ODE),PF-ODE解的轨迹包含不同时刻的语音状态。我们的方法是基于一致性模型的,该模型确保在同一PF-ODE解轨迹上的任何语音状态都对应着相同的初始状态。通过整合布朗运动桥过程,模型能够生成高清晰度语音样本而无需对抗训练。这是第一个尝试将一致性模型应用于SE任务,在多个指标上实现了最先进的结果,而与扩散基线相比,采样所需的时间节省了15倍。我们对各种数据集的实验表明,SE-Bridge在SE任务中非常有效。此外,我们通过对后续任务,包括自动语音识别(ASR)和语音识别(SV)等广泛的实验,证明了SE-Bridge能够有效支持多个后续任务。
URL
https://arxiv.org/abs/2305.13796