Paper Reading AI Learner

Examining the Mapping Functions of Denoising Autoencoders in Music Source Separation

2019-04-12 11:22:43
Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefanía Cano, Gerald Schuller

Abstract

The goal of this work is to investigate what music source separation approaches based on neural networks learn from the data. We examine the mapping functions of neural networks that are based on the denoising autoencoder (DAE) model, and conditioned on the mixture magnitude spectra. For approximating the mapping functions, we propose an algorithm that is inspired by the knowledge distillation and is denoted as the neural couplings algorithm (NCA). The NCA yields a matrix that expresses the mapping of the mixture to the target source magnitude information. Using the NCA we examine the mapping functions of three fundamental DAE models in music source separation; one with single layer encoder and decoder, one with multi-layer encoder and single layer decoder, and one using the skip-filtering connections (SF) with a single encoding and decoding layer. We first train these models with realistic data to estimate the singing voice magnitude spectra from the corresponding mixture. We then use the optimized models and test spectral data as input to the NCA. Our experimental findings show that approaches based on the DAE model learn scalar filtering operators, exhibiting a predominant diagonal structure in their corresponding mapping functions, limiting the exploitation of inter-frequency structure of music data. In contrast, skip-filtering connections are shown to assist the DAE model in learning filtering operators that exploit richer inter-frequency structure.

Abstract (translated)

本文的目的是研究基于神经网络的音乐源分离方法从数据中学习到什么。我们研究了基于去噪自编码模型的神经网络的映射函数,并以混合幅度谱为条件。为了逼近映射函数,提出了一种基于知识蒸馏的神经耦合算法(NCA)。NCA生成一个矩阵,表示混合到目标震源震级信息的映射。利用NCA,我们研究了音乐源分离中三种基本的DAE模型的映射功能:一种采用单层编码器和解码器,一种采用多层编码器和单层解码器,另一种采用单层编码和解码层的跳过过滤连接(SF)。我们首先用实际数据训练这些模型,从相应的混合物中估计出声音的幅度谱。然后,我们使用优化模型和测试光谱数据作为NCA的输入。我们的实验结果表明,基于DAE模型的方法学习标量滤波算子,在相应的映射函数中显示出主要的对角结构,限制了音乐数据的频率间结构的利用。相比之下,跳过过滤连接显示有助于dae模型学习利用更丰富的频率间结构的过滤操作员。

URL

https://arxiv.org/abs/1904.06157

PDF

https://arxiv.org/pdf/1904.06157.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot