Paper Reading AI Learner

Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription

2025-06-17 06:25:35
Anna Hamberger, Sebastian Murgul, Jochen Schmidt, Michael Heizmann

Abstract

Music transcription plays a pivotal role in Music Information Retrieval (MIR), particularly for stringed instruments like the guitar, where symbolic music notations such as MIDI lack crucial playability information. This contribution introduces the Fretting-Transformer, an encoderdecoder model that utilizes a T5 transformer architecture to automate the transcription of MIDI sequences into guitar tablature. By framing the task as a symbolic translation problem, the model addresses key challenges, including string-fret ambiguity and physical playability. The proposed system leverages diverse datasets, including DadaGP, GuitarToday, and Leduc, with novel data pre-processing and tokenization strategies. We have developed metrics for tablature accuracy and playability to quantitatively evaluate the performance. The experimental results demonstrate that the Fretting-Transformer surpasses baseline methods like A* and commercial applications like Guitar Pro. The integration of context-sensitive processing and tuning/capo conditioning further enhances the model's performance, laying a robust foundation for future developments in automated guitar transcription.

Abstract (translated)

音乐转录在音乐信息检索(MIR)中扮演着至关重要的角色,尤其是在像吉他这样的弦乐器领域,因为符号化的乐谱如MIDI缺乏关键的演奏性信息。本文介绍了一种名为Fretting-Transformer的编码器-解码器模型,该模型利用了T5变压器架构来自动将MIDI序列转换为吉他的六线谱(tablature)。通过将任务视为一种符号翻译问题,该模型解决了包括琴弦品格模糊性和物理演奏性在内的关键挑战。所提出的系统使用多样化的数据集,如DadaGP、GuitarToday和Leduc,并采用了创新的数据预处理和标记策略。我们开发了用于评估六线谱准确性和演奏性的度量标准,以便定量地评价模型的性能。实验结果表明,Fretting-Transformer在基线方法(如A*算法)和商业应用(如Guitar Pro)上都表现出色。通过集成上下文敏感处理和调音/卡普奥调节,进一步增强了该模型的性能,为自动吉他转录未来的发展奠定了坚实的基础。

URL

https://arxiv.org/abs/2506.14223

PDF

https://arxiv.org/pdf/2506.14223.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot