Disentangling speech from surroundings in a neural audio codec

2022-03-29 13:58:33

Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi

arXiv_SD

arXiv_SD Embedding Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

We present a method to separate speech signals from noisy environments in the compressed domain of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represents the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.

Abstract (translated)

URL

https://arxiv.org/abs/2203.15578

PDF

https://arxiv.org/pdf/2203.15578.pdf