Abstract
Led by the success of neural style transfer on visual arts, there has been a rising trend very recently in the effort of music style transfer. However, "music style" is not yet a well-defined concept from a scientific point of view. The difficulty lies in the intrinsic multi-level and multi-modal character of music representation (which is very different from image representation). As a result, depending on their interpretation of "music style", current studies under the category of "music style transfer", are actually solving completely different problems that belong to a variety of sub-fields of Computer Music. Also, a vanilla end-to-end approach, which aims at dealing with all levels of music representation at once by directly adopting the method of image style transfer, leads to poor results. Thus, we vitally propose a more scientifically-viable definition of music style transfer by breaking it down into precise concepts of timbre style transfer, performance style transfer and composition style transfer, as well as to connect different aspects of music style transfer with existing well-established sub-fields of computer music studies. In addition, we discuss the current limitations of music style modeling and its future directions by drawing spirit from some deep generative models, especially the ones using unsupervised learning and disentanglement techniques.
Abstract (translated)
在视觉艺术的神经风格转移成功的带动下,最近音乐风格转移的努力呈上升趋势。然而,从科学的角度来看,“音乐风格”还不是一个定义明确的概念。困难在于音乐表现的内在多层次和多模态特征(与图像表示非常不同)。因此,根据他们对“音乐风格”的解释,目前在“音乐风格转移”类别下的研究实际上解决了属于计算机音乐的各个子领域的完全不同的问题。此外,旨在通过直接采用图像样式传递方法同时处理所有级别的音乐表现的香草端到端方法导致不良结果。因此,我们通过将其分解为音色风格转移,表演风格转移和构图风格转移的精确概念,以及将音乐风格转移的不同方面与现有音乐联系起来,为音乐风格转移提供更科学可行的定义。建立了计算机音乐研究的子领域。此外,我们通过从一些深层生成模型中吸取精神来讨论音乐风格建模的当前局限性及其未来方向,尤其是使用无监督学习和解开技术的模型。
URL
https://arxiv.org/abs/1803.06841