Paper Reading AI Learner

Making History Readable

2024-11-26 17:06:58
Bipasha Banerjee, Jennifer Goyne, William A. Ingram

Abstract

The Virginia Tech University Libraries (VTUL) Digital Library Platform (DLP) hosts digital collections that offer our users access to a wide variety of documents of historical and cultural importance. These collections are not only of academic importance but also provide our users with a glance at local historical events. Our DLP contains collections comprising digital objects featuring complex layouts, faded imagery, and hard-to-read handwritten text, which makes providing online access to these materials challenging. To address these issues, we integrate AI into our DLP workflow and convert the text in the digital objects into a machine-readable format. To enhance the user experience with our historical collections, we use custom AI agents for handwriting recognition, text extraction, and large language models (LLMs) for summarization. This poster highlights three collections focusing on handwritten letters, newspapers, and digitized topographic maps. We discuss the challenges with each collection and detail our approaches to address them. Our proposed methods aim to enhance the user experience by making the contents in these collections easier to search and navigate.

Abstract (translated)

弗吉尼亚理工大学图书馆(VTUL)的数字图书馆平台(DLP)托管了多种数字化藏品,为用户提供了访问具有历史和文化重要性的各种文档的机会。这些收藏不仅具有学术价值,还让用户能够了解本地的历史事件。我们的DLP包含了一系列复杂的数字对象,包括布局复杂、图像褪色以及难以辨认的手写文本等内容,这使得在线提供这些材料变得颇具挑战性。为解决这些问题,我们将在DLP工作流程中集成了人工智能,并将数字对象中的文字转换成机器可读的格式。为了增强用户对历史藏品的体验,我们使用了定制的人工智能代理进行手写识别、文本提取以及大型语言模型(LLMs)进行总结。此海报重点介绍了三个收藏项目:手写信件、报纸和数字化地形图。我们将讨论每个收藏项目的挑战,并详细说明我们的解决方法。所提出的方法旨在通过使这些藏品中的内容更易于搜索和导航,从而增强用户体验。

URL

https://arxiv.org/abs/2411.17600

PDF

https://arxiv.org/pdf/2411.17600.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot