MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Abstract
Abstract (translated)
URL
PDF

Abstract

Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decoding, which achieves high-quality and rich-semantic reconstructions using only 1 hour of fMRI training data benefiting from the phenomena of visual fingerprint in the human visual system and a novel fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the image modality as the intermediate pivot modality to achieve fMRI-to-text alignment, which achieves impressive fMRI-to-text retrieval performance and corrects fMRI-to-image reconstruction with fine-tuned semantics. The results of both qualitative and quantitative analyses demonstrate that MindTuner surpasses state-of-the-art cross-subject visual decoding models on the Natural Scenes Dataset (NSD), whether using training data of 1 hour or 40 hours.

Abstract (translated)

解码自然视觉场景的大脑活动已经蓬勃发展，在单人任务和跨subject任务中都有广泛的研究，然而跨subject任务中的研究较少。在跨subject任务中重构高质量图像是一个具有挑战性的问题，由于不同受试者之间的深刻个人差异和数据注释的稀缺性。在这项工作中，我们提出了MindTuner，一种跨subject视觉解码器，利用仅1小时的fMRI训练数据实现高质量和高语义重建，并得益于人脑视觉系统中的视觉指纹现象和一种新颖的fMRI-to-text对齐范式。首先，我们在7个受试者之间预训练一个多受试者模型，然后用在新受试者上的稀缺数据对其进行微调，其中使用跳过LoRAs来学习视觉指纹。接着，我们将图像模态作为中间 pivot 模态来实现fMRI-to-text 对齐，实现令人印象深刻的fMRI-to-text检索性能，并纠正fMRI-to-图像重构，通过微调语义来修复。定性和定量的分析结果表明，MindTuner在自然场景数据集（NSD）上超越了最先进的跨subject视觉解码模型，无论是使用1小时的训练数据还是40小时的训练数据。

URL

https://arxiv.org/abs/2404.12630

PDF

https://arxiv.org/pdf/2404.12630.pdf

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Abstract

Abstract (translated)

URL

PDF Copy

PDF