MANNER: Multi-view Attention Network for Noise Erasure

2022-03-04 08:27:35

Hyun Joon Park, Byung Ha Kang, Wooseok Shin, Jin Sob Kim, Sung Won Han

arXiv_SD

arXiv_SD CNN Attention Pose Enhancement Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

Abstract (translated)

URL

https://arxiv.org/abs/2203.02181

PDF

https://arxiv.org/pdf/2203.02181.pdf