Spectrogram Inpainting for Interactive Generation of Instrument Sounds

2021-04-15 15:17:31

Théis Bazin, Gaëtan Hadjeres, Philippe Esling, Mikhail Malt

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Modern approaches to sound synthesis using deep neural networks are hard to control, especially when fine-grained conditioning information is not available, hindering their adoption by musicians. In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds. To this end, we propose a two-step approach: first, we adapt the VQ-VAE-2 image generation architecture to spectrograms in order to convert real-valued spectrograms into compact discrete codemaps, we then implement token-masked Transformers for the inpainting-based generation of these codemaps. We apply the proposed architecture on the NSynth dataset on masked resampling tasks. Most crucially, we open-source an interactive web interface to transform sounds by inpainting, for artists and practitioners alike, opening up to new, creative uses.

Abstract (translated)

URL

https://arxiv.org/abs/2104.07519

PDF

https://arxiv.org/pdf/2104.07519.pdf