Abstract
With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive. This explosion of information and need for more sophisticated and efficient information handling tools gives rise to Information Extraction(IE) and Information Retrieval(IR) technology. Information Extraction systems takes natural language text as input and produces structured information specified by certain criteria, that is relevant to a particular application. Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural Language Processing (NLP) tasks such as Machine Translation, Question-Answering System, Natural Language Understanding, Text Summarization and Digital Assistants like Siri, Cortana and Google Now. This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions.
Abstract (translated)
随着数字时代的兴起,新闻,文章,社交媒体等形式的信息爆炸式增长。这些数据大部分都是非结构化形式,人工管理和有效利用它是繁琐,乏味和劳动密集型的。信息爆炸和对更复杂和有效的信息处理工具的需求引起了信息提取(IE)和信息检索(IR)技术。信息提取系统将自然语言文本作为输入,并生成由特定标准指定的结构化信息,这些信息与特定应用程序相关。 IE的各种子任务,如命名实体识别,共指解析,命名实体链接,关系提取,知识库推理,构成了各种高端自然语言处理(NLP)任务的构建模块,如机器翻译,问答系统,自然语言理解,文本摘要和数字助理,如Siri,Cortana和Google Now。本文介绍了信息提取技术及其各种子任务,重点介绍了各种IE子任务的最新研究,当前的挑战和未来的研究方向。
URL
https://arxiv.org/abs/1807.02383