Abstract
Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.
Abstract (translated)
历史地图图例对于解读地图符号至关重要。然而,由于其布局不一致和格式无结构化,自动提取这些信息具有挑战性。先前的工作主要集中在分割或通用光学字符识别(OCR)上,而很少有方法能够以结构化方式有效匹配图例符号及其相应的描述。我们提出了一种结合LayoutLMv3进行版面检测与使用GPT-4和上下文学习通过边界框预测来检测并链接图例项及其描述的方法。我们的实验表明,在使用结构化的JSON提示时,GPT-4的表现优于基线模型,F1分数达到88%,IoU(交并比)为85%。此外,这些实验揭示了提示设计、示例数量以及布局对齐如何影响性能。这种方法支持大规模且版面感知的图例解析,并改善了各种视觉风格下历史地图的索引和可搜索性。
URL
https://arxiv.org/abs/2510.08385