CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction

Abstract
Abstract (translated)
URL
PDF

Abstract

The process of cyber mapping gives insights in relationships among financial entities and service providers. Centered around the outsourcing practices of companies within fund prospectuses in Germany, we introduce a dataset specifically designed for named entity recognition and relation extraction tasks. The labeling process on 948 sentences was carried out by three experts which yields to 5,969 annotations for four entity types (Outsourcing, Company, Location and Software) and 4,102 relation annotations (Outsourcing-Company, Company-Location). State-of-the-art deep learning models were trained to recognize entities and extract relations showing first promising results. An anonymized version of the dataset, along with guidelines and the code used for model training, are publicly available at this https URL.

Abstract (translated)

面向金融实体和服务提供者的网络映射过程提供了关于这些实体之间关系的洞察。在德国基金概要中的公司对外包实践进行研究，我们引入了一个专门为命名实体识别和关系提取任务设计的数据集。对948个句子的标注过程由三位专家进行，结果为四个实体类型（外包、公司、地点和软件）的5,969个标注和四个实体类型之间关系的4,102个标注。最先进的深度学习模型被训练来识别实体并提取关系，显示出 promising 的初步结果。匿名化的数据集，以及用于模型训练的指南和代码，可以在这个https URL上获得。

URL

https://arxiv.org/abs/2403.15322

PDF

https://arxiv.org/pdf/2403.15322.pdf

CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction

Abstract

Abstract (translated)

URL

PDF Copy

PDF