Paper Reading AI Learner

An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

2023-01-31 09:48:53
Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner, Valerie J. Gillet

Abstract

Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadata-based features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance.

Abstract (translated)

歌曲预测是音乐信息检索(MIR)新兴领域之一,仍是一项相当大的挑战。能够理解一首歌为什么成为热门曲目对整个音乐行业显然有益处。以往的歌曲预测方法主要关注使用唱片音频特征。本研究旨在使用更多的替代 metadata(包括 Spotify 提供的歌曲音频特征、歌曲歌词以及新的 metadata 特征(标题主题、流行度连续性和曲风类别)),以提高在公告牌Hot 100歌曲中排名前十的歌曲的预测结果,包括歌曲音频特征由 Spotify 提供、歌曲歌词以及新的 metadata 特征(标题主题、流行度连续性和曲风类别)。采用五种机器学习方法,包括 k-Nearest Neighbors、Naive Bayes、Random Forest、Logistic Regression 和多层感知机。我们的结果显示,随机 Forest(RF)和Logistic Regression(LR)的所有特征(包括新的特征、歌曲音频特征和歌词特征) outperform 其他模型,实现 89.1% 和 87.2% 的准确性,以及 0.91 和 0.93 AUC。我们的研究结果还表明,我们的新型音乐 metadata 特征的实用性,这些特征对模型的区分性能贡献最大。

URL

https://arxiv.org/abs/2301.13507

PDF

https://arxiv.org/pdf/2301.13507.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot