OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages

Abstract
Abstract (translated)
URL
PDF

Abstract

Following approaches for understanding lexical meaning developed by Yaska, Patanjali and Bhartrihari from Indian linguistic traditions and extending approaches developed by Leibniz and Brentano in the modern times, a framework of formal ontology of language was developed. This framework proposes that meaning of words are in-formed by intrinsic and extrinsic ontological structures. The paper aims to capture such intrinsic and extrinsic meanings of words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech have been rendered into sense-types and sense-classes. Using them we have developed a gold- standard annotated lexical resource to support semantic understanding of a language. The resource has collection of Hindi and Telugu lexicons, which has been manually annotated by native speakers of the languages following our annotation guidelines. Further, the resource was utilised to derive adverbial sense-class distribution of verbs and karaka-verb sense- type distribution. Different corpora (news, novels) were compared using verb sense-types distribution. Word Embedding was used as an aid for the enrichment of the resource. This is a work in progress that aims at lexical coverage of language extensively.

Abstract (translated)

遵循Yaska，Patanjali和Bhartrihari从印度语言传统和Leibniz和Brentano在现代开发的方法中理解词汇意义的方法，开发了语言形式本体的框架。该框架提出词语的意义是由内在和外在的本体论结构形成的。本文旨在捕捉两种主要印度语言即印地语和泰卢固语的内在和外在意义。词性已被渲染为有义类型和有义类。使用它们，我们开发了一个金标准的注释词汇资源，以支持语言的语义理解。该资源包含印地语和泰卢固语词典的集合，这些词汇已根据我们的注释指南由母语人士手动注释。此外，该资源被用于推导动词和karaka-verb感知类型分布的状语意义类分布。使用动词感知类型分布比较不同的语料库（新闻，小说）。 Word嵌入被用来帮助丰富资源。这是一项正在进行的工作，旨在广泛地对语言进行词汇覆盖。

URL

https://arxiv.org/abs/1808.00694

PDF

https://arxiv.org/pdf/1808.00694.pdf