CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

Abstract
Abstract (translated)
URL
PDF

Abstract

Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from this https URL.

Abstract (translated)

提取军事文本中的结构化事件知识，包括事件触发器和相应论据，对于许多应用来说至关重要，如情报分析和决策支持。然而，军事领域的事件提取面临着数据稀缺的问题，这阻碍了该领域事件提取模型的研究。为了解决这个问题，我们提出了CMNEE，一个大规模、文档级别的开源中国军事新闻事件提取数据集。它包含17,000个文档和29,223个事件，所有这些都根据预定义的军事领域数据模型进行手动注释，包括8种事件类型和11种论据角色类型。我们设计了一个两级、多轮注释策略，以确保CMNEE的质量和系统地评估了多个最先进的event extraction模型。CMNEE在实验结果方面显然短于其他领域数据集，这表明军事领域的事件提取提出了独特的挑战，需要进一步的研究努力。我们的代码和数据可以从该https URL获得。

URL

https://arxiv.org/abs/2404.12242

PDF

https://arxiv.org/pdf/2404.12242.pdf

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

Abstract

Abstract (translated)

URL

PDF Copy

PDF