Abstract
Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.
Abstract (translated)
深度学习在提高听觉辅助用户语音信号的清晰度和可听度方面具有潜力。适用于真实世界应用的深度模型应该具有低计算复杂度和低延迟,只有几毫秒。在本文中,我们探讨了符合这些要求的深度语音增强,并将在两个复杂的声场景中比较双耳和单耳处理算法。两种算法都用客观指标评估,并通过听觉受损的听众在语音噪声测试中进行实验。结果与两种传统增强策略(自适应差动麦克风处理和双耳波束成形)进行比较。虽然在扩散噪声中,所有算法表现相似,但双耳深度学习方法在存在空间干扰的情况下表现最佳。通过后分析,这可以归因于在低信噪比下信号质量的提高和精确的空间滤波。
URL
https://arxiv.org/abs/2405.01967