Abstract
This paper describes our participation in the Shared Task on Software Mentions Disambiguation (SOMD), with a focus on improving relation extraction in scholarly texts through Generative Language Models (GLMs) using single-choice question-answering. The methodology prioritises the use of in-context learning capabilities of GLMs to extract software-related entities and their descriptive attributes, such as distributive information. Our approach uses Retrieval-Augmented Generation (RAG) techniques and GLMs for Named Entity Recognition (NER) and Attributive NER to identify relationships between extracted software entities, providing a structured solution for analysing software citations in academic literature. The paper provides a detailed description of our approach, demonstrating how using GLMs in a single-choice QA paradigm can greatly enhance IE methodologies. Our participation in the SOMD shared task highlights the importance of precise software citation practices and showcases our system's ability to overcome the challenges of disambiguating and extracting relationships between software mentions. This sets the groundwork for future research and development in this field.
Abstract (translated)
本文描述了我们参与共享任务:软件提及歧义(SOMD),重点是通过使用生成语言模型(GLMs)进行单选问题回答,提高学术文本中关系抽取的准确性。研究方法优先考虑了GLMs在上下文理解能力,以提取与软件相关的实体及其描述特征,如分布信息。我们的方法使用了检索增强生成(RAG)技术以及GLMs进行命名实体识别(NER)和属性实体识别(Attributive NER)来识别提取的软件实体之间的关系,为分析学术文献中的软件引用提供了结构化的解决方案。本文详细描述了我们的方法,展示了在单选问题回答范式中使用GLMs可以大大提高IE方法的效果。我们参与SOMD共享任务,强调了精确的软件引用实践的重要性,并展示了我们系统克服了软件提及歧义和提取关系的能力。为该领域的未来研究和开发奠定了基础。
URL
https://arxiv.org/abs/2404.05587