Abstract
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.
Abstract (translated)
在研究结果对社会产生强烈影响的科学学科中,减少理解,综合和利用研究所花费的时间是非常宝贵的。主题建模是一种有效的技术,用于总结文档集合以查找其中的主要主题,并对具有类似混合单词的其他文档进行分类。我们将展示如何根据重要领域短语词汇表中提取的本体论基础,改进所生成的主题并使其更易于理解。我们将此方法应用于气候科学领域并对其进行评估。结果改进了所生成的主题,支持更快的研究理解,研究人员之间的社交网络发现以及自动本体生成。
URL
https://arxiv.org/abs/1807.10965