Abstract
Large Language Models (LLMs) and Multimodal LLMs have shown promising capabilities for SVG processing, yet existing benchmarks suffer from limited real-world coverage, lack of complexity stratification, and fragmented evaluation paradigms. We introduce SVGenius, a comprehensive benchmark comprising 2,377 queries across three progressive dimensions: understanding, editing, and generation. Built on real-world data from 24 application domains with systematic complexity stratification, SVGenius evaluates models through 8 task categories and 18 metrics. We assess 22 mainstream models spanning different scales, architectures, training paradigms, and accessibility levels. Our analysis reveals that while proprietary models significantly outperform open-source counterparts, all models exhibit systematic performance degradation with increasing complexity, indicating fundamental limitations in current approaches; however, reasoning-enhanced training proves more effective than pure scaling for overcoming these limitations, though style transfer remains the most challenging capability across all model types. SVGenius establishes the first systematic evaluation framework for SVG processing, providing crucial insights for developing more capable vector graphics models and advancing automated graphic design applications. Appendix and supplementary materials (including all data and code) are available at this https URL.
Abstract (translated)
大型语言模型(LLMs)和多模态LLMs在SVG处理方面展示了有前景的能力,然而现有的基准测试存在现实世界覆盖率有限、缺乏复杂度分层以及评估方法分散等问题。我们引入了SVGenius,这是一个全面的基准测试集,包含2,377个查询,涵盖了理解、编辑和生成三个渐进维度。该基准测试基于来自24个应用领域的实际数据,并系统地进行了复杂度分级。SVGenius通过8类任务和18项指标对模型进行评估。我们评估了22种主流模型,这些模型涵盖不同的规模、架构、训练方法和可访问性级别。我们的分析表明,尽管专有模型明显优于开源模型,但所有模型在复杂度增加时均表现出系统性的性能下降,这揭示了当前方法的基本局限;然而,增强推理能力的训练比单纯的规模扩展更有效于克服这些局限,不过样式迁移仍然是所有类型模型面临的最艰巨的能力挑战。SVGenius建立了首个针对SVG处理的系统性评估框架,为开发更加高效向量图形模型和推进自动化图形设计应用提供了重要见解。 附录及补充材料(包括所有数据和代码)可在此链接获取:[https URL]
URL
https://arxiv.org/abs/2506.03139