Paper Reading AI Learner

Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module

2018-05-23 18:03:08
Juan Pavez, Héctor Allende, Héctor Allende-Cid

Abstract

During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI-10k, we set a new state-of-the-art, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.

Abstract (translated)

在过去的几年里,人们对使用深度神经网络实现某种复杂的推理有很大的兴趣。为此,Memory Networks(MemNNs)等模型将外部存储器和注意机制结合在一起。但是,这些体系结构缺乏更复杂的推理机制,例如可能会导致关系推理。另一方面,关系网络(RN)在关系推理任务中表现出优异的结果。不幸的是,它们的计算成本随着存储器数量的增加而呈二次曲线增长,这对于更大的问题而言是不可接受的。为了解决这些问题,我们引入了工作存储器网络,一种带有新型工作存储器存储和推理模块的MemNN架构。我们的模型保留了RN的关系推理能力,同时将其计算复杂度从二次方降低到线性。我们在文本QA数据集bAbI和视觉QA数据集NLVR上测试了我们的模型。在联合训练的bAbI-10k中,我们设置了一个新的技术水平,实现了小于0.5%的平均误差。此外,我们两个模型的简单集合解决了联合版本基准测试中的所有20个任务。

URL

https://arxiv.org/abs/1805.09354

PDF

https://arxiv.org/pdf/1805.09354.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot