Paper Reading AI Learner

Data-to-Text Bilingual Generation

2023-11-24 19:05:57
Guy Lapalme


This document illustrates the use of pyrealb for generating two parallel texts (English and French) from a single source of data. The data selection and text organisation processes are shared between the two languages. only language dependent word and phrasing choices are distinct processes. The realized texts thus convey identical information in both languages without the risk of being lost in translation. This is especially important in cases where strict and simultaneous bilingualism is required. We first present the types of applications targeted by this approach and how the pyrealb English and French realizer can be used for achieving this goal in a natural way. We describe an object-oriented organization to ensure a convenient realization in both languages. To illustrate the process, different types of applications are then briefly sketched with links to the source code. A brief comparison of the text generation is given with the output of an instance of a GPT.

Abstract (translated)

本文展示了使用pyrealb从单一数据源生成两种并行文本(英语和法语)的方法。数据选择和文本组织过程在两种语言之间共享。只有语言相关的单词和短语选择是不同的过程。因此,实现文本在两种语言中传达相同信息,不会丢失翻译的风险。尤其是在需要进行严格的双语主义要求的情况下,这一点尤为重要。我们首先介绍使用这种方法的各类应用以及pyrealb English和法语实现器如何以自然的方式实现这一目标。我们描述了一个面向对象的设置,以确保在两种语言中实现便利的实现。为了说明过程,我们简要描述了不同类型的应用程序以及与源代码的链接。简要比较了文本生成与GPT实例的输出。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot