Paper Reading AI Learner

SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling

2025-06-17 08:08:08
Tawsif Ahmed, Andrej Radonjic, Gollam Rabby

Abstract

We present Sleeping-DISCO 9M, a large-scale pre-training dataset for music and song. To the best of our knowledge, there are no open-source high-quality dataset representing popular and well-known songs for generative music modeling tasks such as text-music, music-captioning, singing-voice synthesis, melody reconstruction and cross-model retrieval. Past contributions focused on isolated and constrained factors whose core perspective was to create synthetic or re-recorded music corpus (e.g. GTSinger, M4Singer) and arbitrarily large-scale audio datasets (e.g. DISCO-10M and LAIONDISCO-12M) had been another focus for the community. Unfortunately, adoption of these datasets has been below substantial in the generative music community as these datasets fail to reflect real-world music and its flavour. Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.

Abstract (translated)

我们介绍了Sleeping-DISCO 9M,这是一个用于音乐和歌曲的大规模预训练数据集。据我们所知,目前还没有开源的高质量数据集能够代表流行且知名的歌曲,以供诸如文本-音乐生成、音乐描述、歌声合成、旋律重构及跨模型检索等任务使用。以往的研究主要集中在孤立和受限的因素上,其核心观点是创建合成或重新录制的音乐语料库(例如GTSinger、M4Singer),而社区的另一个焦点则是任意大规模的音频数据集(如DISCO-10M和LAIONDISCO-12M)。不幸的是,由于这些数据集无法反映现实世界中的音乐及其特色,它们在生成音乐领域并未被广泛采用。我们的数据集改变了这一局面,并提供了基于实际流行音乐及世界级艺术家构建的数据集。

URL

https://arxiv.org/abs/2506.14293

PDF

https://arxiv.org/pdf/2506.14293.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot