Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism

2023-03-23 13:52:42

Dichucheng Li, Mingjin Che, Wenwu Meng, Yulun Wu, Yi Yu, Fan Xia, Wei Li

arXiv_SD

arXiv_SD Detection Classification Attention Pose Action

Abstract
Abstract (translated)
URL
PDF

Abstract

Instrument playing technique (IPT) is a key element of musical presentation. However, most of the existing works for IPT detection only concern monophonic music signals, yet little has been done to detect IPTs in polyphonic instrumental solo pieces with overlapping IPTs or mixed IPTs. In this paper, we formulate it as a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. We create a new dataset, Guzheng\_Tech99, containing Guzheng recordings and onset, offset, pitch, IPT annotations of each note. Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention. The multi-scale network extracts features from different scales, and the self-attention mechanism applied to the feature maps at the coarsest scale further enhances the long-range feature extraction. Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.

Abstract (translated)

乐器演奏技巧(IPT)是音乐呈现的关键元素。然而，大部分现有的IPT检测工作仅关注单音音乐信号，但在具有重叠IPT或混合IPT的复音乐器独奏作品中，检测IPTs仍然是一项挑战。在本文中，我们将IPT问题建模为帧级别的多标签分类问题，并将其应用于Gzheng，一种中国拉弦乐器。我们创建了一个新的数据集Gzheng_Tech99，包含Gzheng录制的每个音符的触发器、触发器位置、音高和IPT标注。由于不同IPT的长度差异很大，我们提出了一种使用多尺度网络和自注意力的方法来解决此问题的新方法。多尺度网络从不同尺度提取特征，而自注意力机制则应用于特征图的最粗尺度上，进一步增强了长距离特征提取。我们的方法比现有方法表现优异，这表明它在IPT检测方面的有效性。

URL

https://arxiv.org/abs/2303.13272

PDF

https://arxiv.org/pdf/2303.13272.pdf