Paper Reading AI Learner

A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

2023-05-12 15:46:36
Vladimir Araujo, Alvaro Soto, Marie-Francine Moens

Abstract

Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.

Abstract (translated)

现有的回答方法通常假设输入内容(例如文档或视频)总是可用来解决问题。Alternatively,引入记忆网络是为了模拟人类在固定容量内存中逐渐增加理解和压缩信息的过程。然而,这些模型只会通过学习整个网络中答案的错误反向传播来维护记忆。相反,有人建议人类有有效的机制来增强记忆能力,例如复习和预测。从这些中汲取灵感,我们提出了一个记忆模型,在处理输入时进行复习和预测,以从流数据中解决问答任务并记忆重要的信息。在训练过程中,我们使用基于掩码关联信息的任务进行自监督训练。我们验证了我们模型在短序列(bAbI)数据集和大型文本(NarrativeQA)和视频(ActivityNet-QA)问答数据集上的性能,比先前的记忆网络方法取得了显著的改进。此外,我们的去除了研究确认了所提出的机制对于记忆模型的重要性。

URL

https://arxiv.org/abs/2305.07565

PDF

https://arxiv.org/pdf/2305.07565.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot