Paper Reading AI Learner

Deep Music Analogy Via Latent Representation Disentanglement

2019-06-09 12:22:06
Ruihan Yang, Dingsu Wang, Ziyu Wang, Tianyao Chen, Junyan Jiang, Gus Xia

Abstract

Analogy is a key solution to automated music generation, featured by its ability to generate both natural and creative pieces based on only a few examples. In general, an analogy is made by partially transferring the music abstractions, i.e., high-level representations and their relationships, from one piece to another; however, this procedure requires disentangling music representations, which takes little effort for musicians but is non-trivial for computers. Three sub-problems arise: extracting latent representations from the observation, disentangling the representations so that each part has a unique semantic interpretation, and mapping the latent representations back to actual music. An explicitly-constrained conditional variational auto-encoder (EC2-VAE) is proposed as a unified solution to all three sub-problems. In this study, we focus on disentangling the pitch and rhythm representations of 8-beat music clips conditioned on chords. In producing music analogies, this model helps us to realize the imaginary situation of "what if" a piece is composed using a different pitch contour, rhythm pattern, chord progression etc., by borrowing the representations from other pieces. Finally, we validate the proposed disentanglement method using objective measurements and evaluate the analogy examples by a subjective study.

Abstract (translated)

类比法是自动音乐生成的一个关键解决方案,其特点是它能够基于几个例子生成自然的和创造性的作品。一般来说,通过将音乐抽象部分地从一个作品转移到另一个作品,即高层次的表现和它们之间的关系来进行类比;然而,这个过程需要分离音乐表现,这对音乐家来说几乎不费吹灰之力,但对计算机来说却是不平凡的。出现了三个次级问题:从观察中提取潜在的表征,分离这些表征,使每个部分都有一个独特的语义解释,并将潜在的表征映射回实际的音乐。针对这三个子问题,提出了一种显式约束条件变分自动编码器(EC2-VAE)。在这项研究中,我们着重于分离8拍音乐剪辑在和弦条件下的音高和节奏表现。在创作音乐类比的过程中,该模型通过借鉴其他作品的表现形式,帮助我们认识到用不同的音高轮廓、节奏模式、和弦行进等来创作一首作品的假想情景。最后,通过客观测量验证了所提出的解散度方法,并通过主观研究对类比实例进行了评价。

URL

https://arxiv.org/abs/1906.03626

PDF

https://arxiv.org/pdf/1906.03626.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot