Paper Reading AI Learner

Cross-Lingual Learning in Multilingual Scene Text Recognition

2023-12-17 20:12:42
Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

Abstract

In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to multilingual STR: (1) Joint learning with high- and low-resource languages may reduce performance on low-resource languages, and (2) CLL works best between typologically similar languages. Through extensive experiments, we show that two general insights may not be applied to multilingual STR. After that, we show that the crucial condition for CLL is the dataset size of high-resource languages regardless of the kind of high-resource languages. Our code, data, and models are available at this https URL.

Abstract (translated)

在本文中,我们研究了跨语言学习(CLL)在多语言场景文本识别(STR)中的应用。CLL将知识从一个语言传递到另一个语言。我们的目标是找到一个条件,利用高资源语言的知识来提高低资源语言的性能。为此,我们首先检查之前工作讨论的关于CLL的两个一般性见解是否适用于多语言STR:(1)联合学习高和低资源语言可能会在低资源语言上降低性能,(2)CLL在类型相似的语言之间效果最好。通过广泛的实验,我们发现两个一般性见解不能应用于多语言STR。然后,我们证明了对于CLL,关键条件是高资源语言的数据集大小,无论高资源语言的类型如何。我们的代码、数据和模型可在此处访问:https://url。

URL

https://arxiv.org/abs/2312.10806

PDF

https://arxiv.org/pdf/2312.10806.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot