Multilingual Evaluation of Semantic Textual Relatedness

Abstract
Abstract (translated)
URL
PDF

Abstract

The explosive growth of online content demands robust Natural Language Processing (NLP) techniques that can capture nuanced meanings and cultural context across diverse languages. Semantic Textual Relatedness (STR) goes beyond superficial word overlap, considering linguistic elements and non-linguistic factors like topic, sentiment, and perspective. Despite its pivotal role, prior NLP research has predominantly focused on English, limiting its applicability across languages. Addressing this gap, our paper dives into capturing deeper connections between sentences beyond simple word overlap. Going beyond English-centric NLP research, we explore STR in Marathi, Hindi, Spanish, and English, unlocking the potential for information retrieval, machine translation, and more. Leveraging the SemEval-2024 shared task, we explore various language models across three learning paradigms: supervised, unsupervised, and cross-lingual. Our comprehensive methodology gains promising results, demonstrating the effectiveness of our approach. This work aims to not only showcase our achievements but also inspire further research in multilingual STR, particularly for low-resourced languages.

Abstract (translated)

互联网内容的爆炸式增长要求具备稳健的自然语言处理（NLP）技术，能够捕捉多样语言中微妙的含义和文化背景。语义文本相关性（STR）超越了表面的单词重叠，考虑了语言元素和非语言因素如主题、情感和观点。尽管它具有关键作用，但先前的NLP研究主要集中在英语，限制了其对其他语言的适用性。解决这个空白，我们的论文深入研究了句子之间的更深层次联系，超越了简单的单词重叠。在英语中心NLP研究的扩展之外，我们探讨了STR在马哈蒂尔语、印地语、西班牙语和英语中的应用，为信息检索、机器翻译等应用提供了潜力。利用SemEval-2024共享任务，我们研究了三种学习范式下的各种语言模型：监督、无监督和跨语言。我们全面的方法论取得了很好的结果，证明了我们的方法的有效性。这项工作旨在展示我们的成就，同时鼓励进一步研究多语言STR，特别是对于资源有限的语言。

URL

https://arxiv.org/abs/2404.09047

PDF

https://arxiv.org/pdf/2404.09047.pdf

Multilingual Evaluation of Semantic Textual Relatedness

Abstract

Abstract (translated)

URL

PDF Copy

PDF