Paper Reading AI Learner

SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

2025-06-03 17:58:57
Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang

Abstract

Large Language Models (LLMs) and Multimodal LLMs have shown promising capabilities for SVG processing, yet existing benchmarks suffer from limited real-world coverage, lack of complexity stratification, and fragmented evaluation paradigms. We introduce SVGenius, a comprehensive benchmark comprising 2,377 queries across three progressive dimensions: understanding, editing, and generation. Built on real-world data from 24 application domains with systematic complexity stratification, SVGenius evaluates models through 8 task categories and 18 metrics. We assess 22 mainstream models spanning different scales, architectures, training paradigms, and accessibility levels. Our analysis reveals that while proprietary models significantly outperform open-source counterparts, all models exhibit systematic performance degradation with increasing complexity, indicating fundamental limitations in current approaches; however, reasoning-enhanced training proves more effective than pure scaling for overcoming these limitations, though style transfer remains the most challenging capability across all model types. SVGenius establishes the first systematic evaluation framework for SVG processing, providing crucial insights for developing more capable vector graphics models and advancing automated graphic design applications. Appendix and supplementary materials (including all data and code) are available at this https URL.

Abstract (translated)

大型语言模型(LLMs)和多模态LLMs在SVG处理方面展示了有前景的能力,然而现有的基准测试存在现实世界覆盖率有限、缺乏复杂度分层以及评估方法分散等问题。我们引入了SVGenius,这是一个全面的基准测试集,包含2,377个查询,涵盖了理解、编辑和生成三个渐进维度。该基准测试基于来自24个应用领域的实际数据,并系统地进行了复杂度分级。SVGenius通过8类任务和18项指标对模型进行评估。我们评估了22种主流模型,这些模型涵盖不同的规模、架构、训练方法和可访问性级别。我们的分析表明,尽管专有模型明显优于开源模型,但所有模型在复杂度增加时均表现出系统性的性能下降,这揭示了当前方法的基本局限;然而,增强推理能力的训练比单纯的规模扩展更有效于克服这些局限,不过样式迁移仍然是所有类型模型面临的最艰巨的能力挑战。SVGenius建立了首个针对SVG处理的系统性评估框架,为开发更加高效向量图形模型和推进自动化图形设计应用提供了重要见解。 附录及补充材料(包括所有数据和代码)可在此链接获取:[https URL]

URL

https://arxiv.org/abs/2506.03139

PDF

https://arxiv.org/pdf/2506.03139.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot