Toward Universal Text-to-Music Retrieval

2022-11-26 13:07:26

SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper introduces effective design choices for text-to-music retrieval systems. An ideal text-based retrieval system would support various input queries such as pre-defined tags, unseen tags, and sentence-level descriptions. In reality, most previous works mainly focused on a single query type (tag or sentence) which may not generalize to another input type. Hence, we review recent text-based music retrieval systems using our proposed benchmark in two main aspects: input text representation and training objectives. Our findings enable a universal text-to-music retrieval system that achieves comparable retrieval performances in both tag- and sentence-level inputs. Furthermore, the proposed multimodal representation generalizes to 9 different downstream music classification tasks. We present the code and demo online.

Abstract (translated)

URL

https://arxiv.org/abs/2211.14558

PDF

https://arxiv.org/pdf/2211.14558.pdf