Abstract
While many topics of the learning-based approach to automated music generation are under active research, musical form is under-researched. In particular, recent methods based on deep learning models generate music that, at the largest time scale, lacks any structure. In practice, music longer than one minute generated by such models is either unpleasantly repetitive or directionless. Adapting a recent music generation model, this paper proposes a novel method to generate music with form. The experimental results show that the proposed method can generate 2.5-minute-long music that is considered as pleasant as the music used to train the model. The paper first reviews a recent music generation method based on language models (transformer architecture). We discuss why learning musical form by such models is infeasible. Then we discuss our proposed method and the experiments.
Abstract (translated)
虽然基于学习的方法在自动音乐生成领域有很多研究,但音乐形式的研究却很少。特别是,基于深度学习模型的最近方法生成的音乐在最大的时间尺度上缺乏任何结构。在实践中,由这类模型生成的超过一分钟的音乐通常是令人不满意的重叠或毫无方向。适应于此类模型的最近音乐生成方法,本文提出了一种生成音乐形式的新方法。实验结果表明,与训练模型使用的音乐相比,所提出的方法生成的2.5分钟音乐被认为是愉悦的。本文首先回顾了一种基于语言模型的最近音乐生成方法(Transformer架构)。我们讨论了通过此类方法学习音乐形式是不切实际的。然后我们讨论了本文提出的音乐生成方法和实验。
URL
https://arxiv.org/abs/2404.11976