Learning Interpretable Representation for Controllable Polyphonic Music Generation

2020-08-17 07:11:16

Ziyu Wang, Dingsu Wang, Yixiao Zhang, Gus Xia

arXiv_CL

arXiv_CL Style_Transfer

Abstract
Abstract (translated)
URL
PDF

Abstract

While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.

Abstract (translated)

URL

https://arxiv.org/abs/2008.07122

PDF

https://arxiv.org/pdf/2008.07122.pdf