Paper Reading AI Learner

Musical Composition Style Transfer via Disentangled Timbre Representations

2019-05-30 06:30:37
Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang

Abstract

Music creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been much investigated. This paper presents, to the best of our knowledge, the first deep learning models for rearranging music of arbitrary genres. Specifically, we build encoders and decoders that take a piece of polyphonic musical audio as input and predict as output its musical score. We investigate disentanglement techniques such as adversarial training to separate latent factors that are related to the musical content (pitch) of different parts of the piece, and that are related to the instrumentation (timbre) of the parts per short-time segment. By disentangling pitch and timbre, our models have an idea of how each piece was composed and arranged. Moreover, the models can realize "composition style transfer" by rearranging a musical piece without much affecting its pitch content. We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github.com/biboamy/instrument-disentangle.

Abstract (translated)

音乐创作不仅包括创作音乐作品的不同部分(如旋律、和弦),还包括安排/选择演奏不同部分的乐器。虽然前者受到越来越多的关注,但后者却没有得到太多的研究。本文就我们所知,提出了第一个重新排列任意音乐类型的深层学习模型。具体地说,我们构建了以一段复调音乐音频为输入,预测其音乐得分的编码器和解码器。我们研究了分离技术,如对抗性训练,以分离与作品不同部分的音乐内容(音高)相关的潜在因素,以及与每个短时间段的部分的乐器(音色)相关的潜在因素。通过分离音高和音色,我们的模型了解了每件作品是如何组成和排列的。此外,该模型还可以在不影响音高内容的情况下,通过重新排列音乐片段来实现“作曲风格转换”。通过实验验证了模型的有效性。为了便于后续研究,我们在https://github.com/biboamy/instrument-disengle上打开了我们的代码源代码。

URL

https://arxiv.org/abs/1905.13567

PDF

https://arxiv.org/pdf/1905.13567.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot