A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

2024-04-11 08:09:57

Thomas Serre (S2A, IDS), Mathieu Fontaine (S2A, IDS), Éric Benhaim, Geoffroy Dutour, Slim Essid (S2A, IDS)

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research efforts have yielded promising PSE mod-els, albeit often accompanied by computationally intensivearchitectures, unsuitable for resource-constrained embeddeddevices. In this paper, we introduce a novel method to per-sonalize a lightweight dual-stage Speech Enhancement (SE)model and implement it within DeepFilterNet2, a SE modelrenowned for its state-of-the-art performance. We seek anoptimal integration of speaker information within the model,exploring different positions for the integration of the speakerembeddings within the dual-stage enhancement architec-ture. We also investigate a tailored training strategy whenadapting DeepFilterNet2 to a PSE task. We show that ourpersonalization method greatly improves the performancesof DeepFilterNet2 while preserving minimal computationaloverhead.

Abstract (translated)

在一个噪音干扰的安静环境中，分离所需发言者的声音是一项具有挑战性的任务。为了实现这一目标，个人化语音增强（PSE）方法利用了发言者声音的先前知识。尽管最近的研究已经产生了有前景的PSE模型，但通常附带计算密集型架构，不适合资源受限的嵌入式设备。在本文中，我们提出了一种新的方法，对轻量级的双级语音增强（SE）模型进行个性化，并将其实现在大卫滤波器网络2中，该网络因其最先进的性能而闻名。我们寻求在模型中优化发言者信息的最佳 integration 位置，探讨将发言者嵌入在双级增强架构中的不同位置。我们还研究了在将大卫滤波器网络2适应PSE任务时如何实现适当的训练策略。我们证明了我们的个性化方法在提高DeepFilterNet2的性能的同时，保留了最小的计算开销。

URL

https://arxiv.org/abs/2404.08022

PDF

https://arxiv.org/pdf/2404.08022.pdf

A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Abstract

Abstract (translated)

URL

PDF Copy

PDF