Paper Reading AI Learner

Music Style Transfer: A Position Paper

2018-07-16 09:24:18
Gus G. Xia, Shuqi Dai

Abstract

Led by the success of neural style transfer on visual arts, there has been a rising trend very recently in the effort of music style transfer. However, "music style" is not yet a well-defined concept from a scientific point of view. The difficulty lies in the intrinsic multi-level and multi-modal character of music representation (which is very different from image representation). As a result, depending on their interpretation of "music style", current studies under the category of "music style transfer", are actually solving completely different problems that belong to a variety of sub-fields of Computer Music. Also, a vanilla end-to-end approach, which aims at dealing with all levels of music representation at once by directly adopting the method of image style transfer, leads to poor results. Thus, we vitally propose a more scientifically-viable definition of music style transfer by breaking it down into precise concepts of timbre style transfer, performance style transfer and composition style transfer, as well as to connect different aspects of music style transfer with existing well-established sub-fields of computer music studies. In addition, we discuss the current limitations of music style modeling and its future directions by drawing spirit from some deep generative models, especially the ones using unsupervised learning and disentanglement techniques.

Abstract (translated)

在视觉艺术的神经风格转移成功的带动下,最近音乐风格转移的努力呈上升趋势。然而,从科学的角度来看,“音乐风格”还不是一个定义明确的概念。困难在于音乐表现的内在多层次和多模态特征(与图像表示非常不同)。因此,根据他们对“音乐风格”的解释,目前在“音乐风格转移”类别下的研究实际上解决了属于计算机音乐的各个子领域的完全不同的问题。此外,旨在通过直接采用图像样式传递方法同时处理所有级别的音乐表现的香草端到端方法导致不良结果。因此,我们通过将其分解为音色风格转移,表演风格转移和构图风格转移的精确概念,以及将音乐风格转移的不同方面与现有音乐联系起来,为音乐风格转移提供更科学可行的定义。建立了计算机音乐研究的子领域。此外,我们通过从一些深层生成模型中吸取精神来讨论音乐风格建模的当前局限性及其未来方向,尤其是使用无监督学习和解开技术的模型。

URL

https://arxiv.org/abs/1803.06841

PDF

https://arxiv.org/pdf/1803.06841.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot