Paper Reading AI Learner

Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech

2025-12-15 18:59:49
Dylan Phelps, Rodrigo Wilkens, Edward Gow-Smith, Lilian Hubner, B\'arbara Malcorra, C\'esar Renn\'o-Costa, Marco Idiart, Maria-Cruz Villa-Uriol, Aline Villavicencio

Abstract

Alzheimer's Disease (AD) is a progressive neurodegenerative condition that adversely affects cognitive abilities. Language-related changes can be automatically identified through the analysis of outputs from linguistic assessment tasks, such as picture description. Language models show promise as a basis for screening tools for AD, but their limited interpretability poses a challenge in distinguishing true linguistic markers of cognitive decline from surface-level textual patterns. To address this issue, we examine how surface form variation affects classification performance, with the goal of assessing the ability of language models to represent underlying semantic indicators. We introduce a novel approach where texts surface forms are transformed by altering syntax and vocabulary while preserving semantic content. The transformations significantly modify the structure and lexical content, as indicated by low BLEU and chrF scores, yet retain the underlying semantics, as reflected in high semantic similarity scores, isolating the effect of semantic information, and finding models perform similarly to if they were using the original text, with only small deviations in macro-F1. We also investigate whether language from picture descriptions retains enough detail to reconstruct the original image using generative models. We found that image-based transformations add substantial noise reducing classification accuracy. Our methodology provides a novel way of looking at what features influence model predictions, and allows the removal of possible spurious correlations. We find that just using semantic information, language model based classifiers can still detect AD. This work shows that difficult to detect semantic impairment can be identified, addressing an overlooked feature of linguistic deterioration, and opening new pathways for early detection systems.

Abstract (translated)

阿尔茨海默病(AD)是一种进行性的神经退行性疾病,会对认知能力产生不利影响。语言相关的改变可以通过分析诸如图片描述等语言评估任务的输出来自动识别。语言模型作为AD筛查工具的基础显示出潜力,但它们有限的可解释性使得区分真正的语言标志和表面文本模式变得困难。为了解决这个问题,我们研究了表层形式的变化如何影响分类性能,并旨在评估语言模型表示潜在语义指标的能力。 我们引入了一种新颖的方法,在此方法中通过改变句法和词汇来变换文本的形式,同时保留其语义内容。这些转换显著改变了结构和词汇内容,如低BLEU和chrF得分所示,但保留了底层的语义,这反映在高的语义相似性评分上,从而隔离了语义信息的影响,并发现模型的表现与使用原始文本时几乎相同,仅有微小的宏观F1值偏差。 我们还探讨了图片描述中的语言是否包含足够的细节以利用生成式模型重建原图。我们发现基于图像的变化会引入大量噪音,降低分类准确性。 我们的方法为研究哪些特征影响模型预测提供了一种新的视角,并允许消除可能存在的虚假相关性。结果表明,仅使用语义信息时,基于语言模型的分类器仍能检测到AD的存在。 这项工作展示了难以察觉的语义损伤可以被识别出来,弥补了对语言退化的一个未被重视的特点的关注,并为早期诊断系统开辟了新的途径。

URL

https://arxiv.org/abs/2512.13685

PDF

https://arxiv.org/pdf/2512.13685.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot