Paper Reading AI Learner

Residual Memory Networks: Feed-forward approach to learn long temporal dependencies

2018-08-06 14:00:40
Murali Karthick Baskar, Martin Karafiat, Lukas Burget, Karel Vesely, Frantisek Grezl, Jan Honza Cernocky

Abstract

Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.

Abstract (translated)

由于网络复杂性的增加,训练深度递归神经网络(RNN)架构很复杂。这破坏了使用深度RNN学习高阶摘要的过程。在前馈网络的情况下,训练深层结构简单且快速,同时不可能学习长期时间信息。在本文中,我们提出了一种残余记忆神经网络(RMN)架构,使用具有残余和时间延迟连接的深度前馈层来模拟短时依赖性。剩余连接通过实现无阻碍的梯度流来构建更深的网络,并且时间延迟单元利用共享权重捕获时间信息。 RMN中的层数表示分层处理深度和时间深度。与深度复现网络相比,训练RMN的计算复杂度显着降低。 RMN进一步扩展为双向RMN(BRMN)以捕获过去和未来信息。对AMI语料库进行实验分析,以证实RMN在学习长期信息和分层信息方面的能力。使用300小时的Switchboard语料库训练的RMN的识别性能与各种最先进的LVCSR系统进行比较。结果表明,与LSTM和BLSTM网络相比,RMN和BRMN相对改善了6%和3.8%。

URL

https://arxiv.org/abs/1808.01916

PDF

https://arxiv.org/pdf/1808.01916.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot