NESC: Robust Neural End-2-End Speech Coding with GANs

2022-07-07 13:23:58

Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus, Guillaume Fuchs

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.

Abstract (translated)

URL

https://arxiv.org/abs/2207.03282

PDF

https://arxiv.org/pdf/2207.03282.pdf