Abstract
Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix contents from different languages within sentences, generating mix-lingual content. Due to the lack of a dedicated dataset, the effectiveness of existing relation extraction models in such a scenario is largely unexplored. To address this issue, we introduce a novel task of considering relation extraction in the mix-lingual scenario called MixRE and constructing the human-annotated dataset MixRED to support this task. In addition to constructing the MixRED dataset, we evaluate both state-of-the-art supervised models and large language models (LLMs) on MixRED, revealing their respective advantages and limitations in the mix-lingual scenario. Furthermore, we delve into factors influencing model performance within the MixRE task and uncover promising directions for enhancing the performance of both supervised models and LLMs in this novel task.
Abstract (translated)
关系提取是自然语言处理领域中一个关键的任务,具有许多实际应用。现有的研究主要集中在单语义关系提取或跨语言增强关系提取。然而,在混合语言(或代码切换)场景中,关系提取的理解仍然存在显著的差距。在这种场景中,个人在句子中混合来自不同语言的内容,生成混合语言内容。由于缺乏专门的 datasets,现有关系提取模型的有效性在此场景下的探索受到了限制。为了解决这个问题,我们引入了一个名为MixRE的新任务,专注于在混合语言场景中考虑关系提取,并构建了人标注数据集MixRED来支持这个任务。除了构建MixRED数据集之外,我们还评估了最先进的监督模型和大型语言模型(LLMs)在MixRED上的性能,揭示了它们在此混合语言场景中的优势和局限。此外,我们深入研究了影响模型性能的因素,并发现了增强监督模型和LLM性能的新方向,以探索这种新颖任务中模型性能的提高。
URL
https://arxiv.org/abs/2403.15696