MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

2022-11-14 12:27:35

Chang-Bin Jeon, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon, Kyogu Lee

arXiv_SD

arXiv_SD Super_Resolution Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation that corresponds to such categories. We specify the problem definition in this dataset by categorizing the problem into i) duet, ii) unison, iii)main vs. rest, and iv) N-singing separation. Second, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. This can be used to obtain training data. Third, we propose the improved super-resolution network (iSRNet). Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox. Audio samples, the dataset, and codes are available on our GitHub page (this https URL).

Abstract (translated)

URL

https://arxiv.org/abs/2211.07302

PDF

https://arxiv.org/pdf/2211.07302.pdf