Abstract
Music creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been much investigated. This paper presents, to the best of our knowledge, the first deep learning models for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a piece of polyphonic musical audio as input and predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related to the musical content (pitch) of different parts of the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment. By disentangling pitch and timbre, our models have an idea of how each piece was composed and arranged. Moreover, the models can realize "composition style transfer" by rearranging a musical piece without much affecting its pitch content. We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github.com/biboamy/instrument-disentangle.
Abstract (translated)
音乐创作不仅包括创作音乐作品的不同部分(如旋律、和弦),还包括安排/选择演奏不同部分的乐器。虽然前者受到越来越多的关注,但后者却没有得到太多的研究。本文就我们所知,提出了第一个重新排列任意音乐类型的深层学习模型。具体地说,我们构建了以一段复调音乐音频为输入,预测其音乐得分的编码器和解码器。我们研究了分离技术,如对抗性训练,以分离与作品不同部分的音乐内容(音高)相关的潜在因素,以及与每个短时间段的部分的乐器(音色)相关的潜在因素。通过分离音高和音色,我们的模型了解了每件作品是如何组成和排列的。此外,该模型还可以在不影响音高内容的情况下,通过重新排列音乐片段来实现“作曲风格转换”。通过实验验证了模型的有效性。为了便于后续研究,我们在https://github.com/biboamy/instrument-disengle上打开了我们的代码源代码。
URL
https://arxiv.org/abs/1905.13567