Paper Reading AI Learner

Automatic Classification of Arabic Literature into Historical Eras

2026-01-22 17:32:19
Zainab Alhathloul, Irfan Ahmad

Abstract

The Arabic language has undergone notable transformations over time, including the emergence of new vocabulary, the obsolescence of others, and shifts in word usage. This evolution is evident in the distinction between the classical and modern Arabic eras. Although historians and linguists have partitioned Arabic literature into multiple eras, relatively little research has explored the automatic classification of Arabic texts by time period, particularly beyond the domain of poetry. This paper addresses this gap by employing neural networks and deep learning techniques to automatically classify Arabic texts into distinct eras and periods. The proposed models are evaluated using two datasets derived from two publicly available corpora, covering texts from the pre-Islamic to the modern era. The study examines class setups ranging from binary to 15-class classification and considers both predefined historical eras and custom periodizations. Results range from F1-scores of 0.83 and 0.79 on the binary-era classification task using the OpenITI and APCD datasets, respectively, to 0.20 on the 15-era classification task using OpenITI and 0.18 on the 12-era classification task using APCD.

Abstract (translated)

阿拉伯语随着时间的推移经历了显著的变化,包括新词汇的出现、旧词汇的淘汰以及词语使用的转变。这种演变在古典时代和现代阿拉伯时代的区别中尤为明显。虽然历史学家和语言学家已经将阿拉伯文学划分成多个时期,但较少有研究探索自动分类不同时间段的阿拉伯文本,尤其是在诗歌领域之外的研究更为稀缺。本文通过运用神经网络和深度学习技术来填补这一空白,旨在自动将阿拉伯文本划分为不同的时代和地区。所提出的模型使用了两个公开可用语料库派生的数据集进行评估,这些数据集涵盖了从前伊斯兰时期到现代的各种文本。研究考察了从二元分类到15类分类的不同设置,并考虑到了预定义的历史时期和定制的时间段划分。结果显示,在使用OpenITI数据集的二元时代分类任务中,F1分数为0.83;在使用APCD数据集的任务中,为0.79。而在使用OpenITI数据集进行15类时代分类时,F1分数下降到0.20,在使用APCD数据集进行12类时代分类时则降至0.18。

URL

https://arxiv.org/abs/2601.16138

PDF

https://arxiv.org/pdf/2601.16138.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot