Paper Reading AI Learner

Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

2019-04-29 09:37:31
Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, Ping Li

Abstract

In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains more than forty thousand sentences and the corresponding facts in the SAOKE format labeled by crowd-sourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequenceto-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction.

Abstract (translated)

本文研究了在开放域中从句子中提取实体和关系级中间结构的开放信息提取问题。我们着重研究了四种有价值的中间结构(关系、属性、描述和概念),并提出了一种统一的知识表达形式saoke来表达它们。我们公开发布了一个数据集,该数据集包含超过4万句句子和相应的事实,采用众包标记的SAOKE格式。据我们所知,这是用于开放信息提取任务的最大的公开的人工标记数据集。利用这个标记的SAOKE数据集,我们训练了一个端到端的神经模型,使用Sequenceto序列范式,称为逻辑学家,将句子转化为事实。对于每个句子,不同于现有的算法,这些算法通常只关注提取每个单独的事实,而不涉及其他可能的事实,逻辑学家对所有可能涉及的事实进行全局优化,其中事实不仅相互竞争以吸引词语的注意,而且还合作共享词语。通过对各种开放域关系提取任务的实验研究,揭示了逻辑学家与其他先进算法的一致优势。实验验证了SAOKE格式的合理性,SAOKE数据集的价值,所提出的逻辑模型的有效性,以及将端到端学习范式应用于监控数据集的方法对于开放信息提取的挑战性任务的可行性。

URL

https://arxiv.org/abs/1904.12535

PDF

https://arxiv.org/pdf/1904.12535.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot