Paper Reading AI Learner

Explainable Emotion Decoding for Human and Computer Vision

2024-08-01 11:53:44
Alessio Borriero, Martina Milazzo, Matteo Diano, Davide Orsenigo, Maria Chiara Villa, Chiara Di Fazio, Marco Tamietto, Alan Perotti

Abstract

Modern Machine Learning (ML) has significantly advanced various research fields, but the opaque nature of ML models hinders their adoption in several domains. Explainable AI (XAI) addresses this challenge by providing additional information to help users understand the internal decision-making process of ML models. In the field of neuroscience, enriching a ML model for brain decoding with attribution-based XAI techniques means being able to highlight which brain areas correlate with the task at hand, thus offering valuable insights to domain experts. In this paper, we analyze human and Computer Vision (CV) systems in parallel, training and explaining two ML models based respectively on functional Magnetic Resonance Imaging (fMRI) and movie frames. We do so by leveraging the "StudyForrest" dataset, which includes functional Magnetic Resonance Imaging (fMRI) scans of subjects watching the "Forrest Gump" movie, emotion annotations, and eye-tracking data. For human vision the ML task is to link fMRI data with emotional annotations, and the explanations highlight the brain regions strongly correlated with the label. On the other hand, for computer vision, the input data is movie frames, and the explanations are pixel-level heatmaps. We cross-analyzed our results, linking human attention (obtained through eye-tracking) with XAI saliency on CV models and brain region activations. We show how a parallel analysis of human and computer vision can provide useful information for both the neuroscience community (allocation theory) and the ML community (biological plausibility of convolutional models).

Abstract (translated)

现代机器学习(ML)已经在许多研究领域取得了显著的进步,但ML模型的不透明性使得它们在多个领域中难以采用。可解释人工智能(XAI)通过提供额外的信息来帮助用户理解ML模型的内部决策过程,从而解决了这个问题。在神经科学领域,使用基于归因的XAI技术对ML模型进行改进意味着能够突出显示与任务相关的脑区,为领域专家提供宝贵的见解。在本文中,我们并行分析人类和计算机视觉系统,分别基于功能磁共振成像(fMRI)和电影帧训练和解释两个ML模型。我们通过利用“StudyForrest”数据集,该数据集包括观看《福尔摩斯传》电影的fMRI扫描、情感注释和眼动数据,来实现这一目标。对于人类视觉,ML任务是将fMRI数据与情感注释联系起来,解释部分突显了与标签高度相关的脑区。另一方面,对于计算机视觉,输入数据是电影帧,解释是像素级别的热力图。我们进行了交叉分析,将人类注意力和XAI突显与计算机视觉模型的结果联系起来。我们证明了通过并行分析人类和计算机视觉,可以为神经科学界(分配理论)和机器学习社区(卷积模型的生物合理性)提供有用的信息。

URL

https://arxiv.org/abs/2408.00493

PDF

https://arxiv.org/pdf/2408.00493.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot