Paper Reading AI Learner

Profiling German Text Simplification with Interpretable Model-Fingerprints

2026-01-19 13:39:59
Lars Kl\"oser, Mika Beele, Bodo Kraft

Abstract

While Large Language Models (LLMs) produce highly nuanced text simplifications, developers currently lack tools for a holistic, efficient, and reproducible diagnosis of their behavior. This paper introduces the Simplification Profiler, a diagnostic toolkit that generates a multidimensional, interpretable fingerprint of simplified texts. Multiple aggregated simplifications of a model result in a model's fingerprint. This novel evaluation paradigm is particularly vital for languages, where the data scarcity problem is magnified when creating flexible models for diverse target groups rather than a single, fixed simplification style. We propose that measuring a model's unique behavioral signature is more relevant in this context as an alternative to correlating metrics with human preferences. We operationalize this with a practical meta-evaluation of our fingerprints' descriptive power, which bypasses the need for large, human-rated datasets. This test measures if a simple linear classifier can reliably identify various model configurations by their created simplifications, confirming that our metrics are sensitive to a model's specific characteristics. The Profiler can distinguish high-level behavioral variations between prompting strategies and fine-grained changes from prompt engineering, including few-shot examples. Our complete feature set achieves classification F1-scores up to 71.9 %, improving upon simple baselines by over 48 percentage points. The Simplification Profiler thus offers developers a granular, actionable analysis to build more effective and truly adaptive text simplification systems.

Abstract (translated)

尽管大型语言模型(LLMs)能够生成高度细腻的文本简化,但开发者目前缺乏一套全面、高效且可重复的方法来诊断这些模型的行为。本文介绍了"Simplification Profiler"这一诊断工具包,它可以为简化的文本生成一个多维度和可解释的特征指纹。一个模型产生的多个聚合简化结果共同构成了该模型的独特指纹。这种新颖的评估方法对于数据稀缺问题被放大了的语言特别重要,即在创建面向多样化目标群体而非单一固定简化风格的灵活模型时尤为重要。我们提出,在这一背景下,衡量模型独特的行为签名比与人类偏好相关的度量更有意义。 为了实现这一点,我们通过我们的特征指纹描述能力的实际元评估来操作化这一想法,这种方法避免了对大规模人工标记数据集的需求。这项测试旨在测量一个简单的线性分类器是否能够可靠地根据所创建的简化文本识别不同的模型配置,从而确认我们的指标对于特定模型特性具有敏感性。 Profiler 可以区分不同提示策略之间的高层次行为变化,并且可以详细分辨来自提示工程(包括少量示例)的具体变化。我们的完整特征集实现了高达71.9% 的分类F1得分,比简单的基准提高了48个百分点以上。 因此,Simplification Profiler 为开发者提供了细致而可操作的分析工具,帮助他们构建更有效且真正适应性强的文本简化系统。

URL

https://arxiv.org/abs/2601.13050

PDF

https://arxiv.org/pdf/2601.13050.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot