Paper Reading AI Learner

FinMTEB: Finance Massive Text Embedding Benchmark

2025-02-16 04:23:52
Yixuan Tang, Yi Yang

Abstract

Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, FinPersona-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including FinPersona-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

Abstract (translated)

嵌入模型在自然语言处理(NLP)的各种应用中用于表示和检索信息,起着至关重要的作用。大型语言模型(LLMs)的最新进展进一步提高了嵌入模型的表现力。尽管这些模型通常是在通用数据集上进行基准测试的,但实际应用却需要领域特定的评估方法。在这项工作中,我们引入了金融大规模文本嵌入基准 (FinMTEB),它是为金融领域设计的一种专门的 MTEB 对应版本。FinMTEB 包含 64 个专用于金融领域的嵌入数据集,涵盖了从中文和英文中提取的各种文本类型的任务,包括金融新闻文章、公司年报、ESG 报告、监管文件以及收益电话会议记录等。此外,我们开发了一种基于人格的数据合成方法来构建适用于财务的模型 FinPersona-E5,并使用该方法覆盖多样化的金融嵌入任务进行训练。通过对 15 种嵌入模型(包括 FinPersona-E5)进行全面评估,我们展示了三个关键发现:(1) 在通用基准测试中的表现与金融领域的任务关联性有限;(2) 领域适应型模型始终优于其通用对应版本;以及 (3) 出人意料的是,在金融语义文本相似度(STS)任务中,简单的词袋(BoW)方法的表现超过了复杂的密集嵌入技术。这突显了目前在密集嵌入技术中的局限性。我们的工作为金融 NLP 应用建立了稳健的评估框架,并为开发领域特定的嵌入模型提供了关键见解。

URL

https://arxiv.org/abs/2502.10990

PDF

https://arxiv.org/pdf/2502.10990.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot