SE-Bridge: Speech Enhancement with Consistent Brownian Bridge

Abstract
Abstract (translated)
URL
PDF

Abstract

We propose SE-Bridge, a novel method for speech enhancement (SE). After recently applying the diffusion models to speech enhancement, we can achieve speech enhancement by solving a stochastic differential equation (SDE). Each SDE corresponds to a probabilistic flow ordinary differential equation (PF-ODE), and the trajectory of the PF-ODE solution consists of the speech states at different moments. Our approach is based on consistency model that ensure any speech states on the same PF-ODE trajectory, correspond to the same initial state. By integrating the Brownian Bridge process, the model is able to generate high-intelligibility speech samples without adversarial training. This is the first attempt that applies the consistency models to SE task, achieving state-of-the-art results in several metrics while saving 15 x the time required for sampling compared to the diffusion-based baseline. Our experiments on multiple datasets demonstrate the effectiveness of SE-Bridge in SE. Furthermore, we show through extensive experiments on downstream tasks, including Automatic Speech Recognition (ASR) and Speaker Verification (SV), that SE-Bridge can effectively support multiple downstream tasks.

Abstract (translated)

我们提出SE-Bridge，一种用于语音增强的新颖方法(SE)。最近，我们应用扩散模型对语音增强进行了尝试，我们可以通过解决一宗随机微分方程(SDE)来实现语音增强。每个SDE对应着一宗概率流普通微分方程(PF-ODE),PF-ODE解的轨迹包含不同时刻的语音状态。我们的方法是基于一致性模型的，该模型确保在同一PF-ODE解轨迹上的任何语音状态都对应着相同的初始状态。通过整合布朗运动桥过程，模型能够生成高清晰度语音样本而无需对抗训练。这是第一个尝试将一致性模型应用于SE任务，在多个指标上实现了最先进的结果，而与扩散基线相比，采样所需的时间节省了15倍。我们对各种数据集的实验表明，SE-Bridge在SE任务中非常有效。此外，我们通过对后续任务，包括自动语音识别(ASR)和语音识别(SV)等广泛的实验，证明了SE-Bridge能够有效支持多个后续任务。

URL

https://arxiv.org/abs/2305.13796

PDF

https://arxiv.org/pdf/2305.13796.pdf