Abstract
Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments. We will publicly release the dataset and code.
Abstract (translated)
医疗文本通常很难阅读。正确测量其可读性是使其更易访问的第一步。在本文中,我们在句子级别和跨度级别对医疗领域的细粒度可读性测量进行了一项系统性的研究。我们引入了一个名为MedReadMe的新数据集,其中包括4,520个句子,每个句子都由人工标注的读性评分和细粒度复杂跨度注释。它涵盖了两个新的“Google-Easy”和“Google-Hard”类别。它支持我们定量的分析,涵盖了650个语言特征和自动识别复杂词汇。得益于我们高质量的注释,我们基准和提高了多个针对医疗领域的句子级别可读性指标,这些指标包括基于最近发展的大语言模型(LLMs)的无监督、有监督和提示方法。凭借我们细粒度复杂跨度注释,我们发现,向现有的可读性公式中添加一个功能,即捕捉到词汇跨度数量,可以显著提高它们与人类判断的相关性。我们将公开发布这个数据集和代码。
URL
https://arxiv.org/abs/2405.02144