Abstract
We consider the task of multi-track MIDI infilling when arbitrary (track, measure) pairs of information have been deleted from a contiguous slice of measures from a MIDI file. We train two T5-like models to solve this task, one using a basic MIDI-like event vocabulary and one using a joined word-like version of this vocabulary. We introduce a new test set, created from the Lakh MIDI dataset, consisting of 9 multi-track MIDI infilling tasks. We evaluate our models on these tasks and find that one model works better on some tasks while the other works better on others. Our results have implications for the training of neural networks in other small-vocabulary domains, such as byte sequence modeling and protein sequence modeling. We release our source code, and we demonstrate that our models are capable of enabling real-time human-computer interactive composition in the REAPER digital audio workstation.
Abstract (translated)
当从一个MIDI文件中删除任意(轨道,测量)对信息时,我们考虑解决多轨MIDI填充任务。我们训练了两个T5-like模型来解决该任务,一个使用基本的MIDI事件词汇表,另一个使用这个词汇表的连接词版本。我们介绍了从一千份MIDI数据集中创建的新测试集,其中包括9个多轨MIDI填充任务。我们对这些任务进行评估,发现一个模型在某些任务上表现更好,另一个模型在另一些任务上表现更好。我们的结果对其他小型词汇 domains,例如字节序列建模和蛋白质序列建模,产生了影响。我们发布了我们的源代码,并展示了我们的模型能够在REAPER数字音频工作站中实现实时人类计算机互动创作。
URL
https://arxiv.org/abs/2301.12525