Paper Reading AI Learner

Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning

2025-10-09 16:08:48
Sofia Kirsanova, Yao-Yi Chiang, Weiwei Duan

Abstract

Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.

Abstract (translated)

历史地图图例对于解读地图符号至关重要。然而,由于其布局不一致和格式无结构化,自动提取这些信息具有挑战性。先前的工作主要集中在分割或通用光学字符识别(OCR)上,而很少有方法能够以结构化方式有效匹配图例符号及其相应的描述。我们提出了一种结合LayoutLMv3进行版面检测与使用GPT-4和上下文学习通过边界框预测来检测并链接图例项及其描述的方法。我们的实验表明,在使用结构化的JSON提示时,GPT-4的表现优于基线模型,F1分数达到88%,IoU(交并比)为85%。此外,这些实验揭示了提示设计、示例数量以及布局对齐如何影响性能。这种方法支持大规模且版面感知的图例解析,并改善了各种视觉风格下历史地图的索引和可搜索性。

URL

https://arxiv.org/abs/2510.08385

PDF

https://arxiv.org/pdf/2510.08385.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot