HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

2022-05-02 16:45:20

Weixing Wei, Peilin Li, Yi Yu, Wei Li

arXiv_AI

arXiv_AI CNN Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors.

Abstract (translated)

URL

https://arxiv.org/abs/2205.01019

PDF

https://arxiv.org/pdf/2205.01019.pdf