Abstract
Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and waiting for an industrial-level solution. In this paper, we upgrade the previous ByteCover systems to ByteCover3 that utilizes local features to further improve the identification performance of short music queries. ByteCover3 is designed with a local alignment loss (LAL) module and a two-stage feature retrieval pipeline, allowing the system to perform CSI in a more precise and efficient way. We evaluated ByteCover3 on multiple datasets with different benchmark settings, where ByteCover3 beat all the compared methods including its previous versions.
Abstract (translated)
深度学习方法已经成为歌曲覆盖识别(CSI)的范式,Byte Cover系统在CSI主流数据集上取得了最先进的结果。然而,随着短视频的兴起,许多现实世界的应用需要将简短的音乐片段与数据库中的完整音乐曲目匹配,该领域仍待探索并等待工业级解决方案。在本文中,我们将以前的Byte Cover系统升级到Byte Cover3,该系统利用本地特征进一步改进了简短的音乐查询识别性能。Byte Cover3采用了 local alignment loss (LAL)模块和两个阶段的特征提取管道,使系统能够以更精确和高效的方式执行CSI。我们在不同的基准设置下评估了Byte Cover3,其中Byte Cover3在所有比较方法中取得了领先的结果。
URL
https://arxiv.org/abs/2303.11692