End-to-End Neural Audio Coding for Real-Time Communications

2022-01-24 03:06:30

Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

arXiv_SD

arXiv_SD Attention Optimization Pose Enhancement Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Deep-learning based methods have shown their advantages inaudio coding over traditional ones but limited attention hasbeen paid on real-time communications (RTC). This paperproposes the TFNet, an end-to-end neural audio codec withlow latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that seldom being investigated in audiocoding. An interleaved structure is proposed for temporalfiltering to capture both short-term and long-term temporaldependencies. Furthermore, with end-to-end optimization,the TFNet is jointly optimized with speech enhancement andpacket loss concealment, yielding a one-for-all network forthree tasks. Both subjective and objective results demonstratethe efficiency of the proposed TFNet.

Abstract (translated)

URL

https://arxiv.org/abs/2201.09429

PDF

https://arxiv.org/pdf/2201.09429.pdf