Paper Reading AI Learner

Instruct-Tuning Pretrained Causal Language Models for Ancient Greek Papyrology and Epigraphy

2024-09-20 19:49:45
Eric Cullhed

Abstract

This article presents an experiment in fine-tuning a pretrained causal language model (Meta's Llama 3.1 8B Instruct) for aiding in three fundamental tasks of philological research: chronological and geographic attribution as well as text restoration in ancient Greek inscriptions and documentary papyri. Using a prompt-based instruct approach, the fine-tuned models surpass the state of the art in key metrics. For inscriptions, the models achieve a lower average character error rate (CER) of 22.5% (vs. 26.3%), while closely matching top-1 accuracy (60.9% vs. 61.8%) and top-20 accuracy (77.5% vs. 78.3%) for sequences up to 10 characters. They also provide a practical advantage by ignoring spaces during reconstruction, aligning better with the scriptio continua typically used in ancient written artifacts. In geographic attribution, the model outperforms previous benchmarks with a top-1 accuracy of 75.0% (vs. 70.8%) and a top-3 accuracy of 83.7% (vs. 82.1%). For dating, it achieves an average deviation of 26.2 years (vs. 29.3) and a median deviation of 1 year (vs. 3) from the actual date range. The models also set new baselines for documentary papyri, with a CER of 16.3%, a top-1 accuracy of 71.3%, and top-20 of 85.0% in text reconstruction; a top-1 accuracy of 66.4% and top-3 of 79.9% in geographic attribution; and, in chronological attribution, a deviation of 21.7 years from the actual termini post/ante quem, with a median deviation of 0 years.

Abstract (translated)

这篇文章介绍了一个实验:对预训练的因果语言模型(Meta的Llama 3.1 8B指令)进行微调,以帮助促进三个基本的文献研究任务:chronological和geographic attribution,以及古希腊铭文和文书手稿的文字修复。通过一种基于提示的指令方法,微调后的模型在关键指标上超越了最先进的水平。对于铭文,模型实现了22.5%的平均字符错误率(CER)(相对于26.3%),同时对于Top 1准确度(60.9%相对于61.8%)和Top 20准确度(77.5%相对于78.3%)也具有优势。它们还通过在重建过程中忽略空格来提供实际优势,与古代书面文献中通常使用的脚本更接近。在地理归因方面,该模型在Top 1准确度为75.0%(相对于70.8%)和Top 3准确度为83.7%(相对于82.1%)的基准测试中超过了最先进水平。对于日期,它实现了26.2年的平均偏差(相对于29.3%)和1年的中位数偏差(相对于3),以及对于文书手稿的新的基线。这些模型还设定了新的文档手稿基线,具有16.3%的CER,文本重建的Top 1准确度为71.3%,Top 20准确度为85.0%;地理归因的Top 1准确度为66.4%,Top 3准确度为79.9%;在chronological attribution方面,偏差为21.7年,中位偏差为0年。

URL

https://arxiv.org/abs/2409.13870

PDF

https://arxiv.org/pdf/2409.13870.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot