Paper Reading AI Learner

Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI

2025-07-16 17:07:57
Weichen Dai, Yuxuan Huang, Li Zhu, Dongjun Liu, Yu Zhang, Qibin Zhao, Andrzej Cichocki, Fabio Babiloni, Ke Li, Jianyu Qiu, Gangyong Jia, Wanzeng Kong, Qing Wu

Abstract

Humans possess a remarkable capacity for spatial cognition, allowing for self-localization even in novel or unfamiliar environments. While hippocampal neurons encoding position and orientation are well documented, the large-scale neural dynamics supporting spatial representation, particularly during naturalistic, passive experience, remain poorly understood. Here, we demonstrate for the first time that non-invasive brain-computer interfaces (BCIs) based on electroencephalography (EEG) can decode spontaneous, fine-grained egocentric 6D pose, comprising three-dimensional position and orientation, during passive viewing of egocentric video. Despite EEG's limited spatial resolution and high signal noise, we find that spatially coherent visual input (i.e., continuous and structured motion) reliably evokes decodable spatial representations, aligning with participants' subjective sense of spatial engagement. Decoding performance further improves when visual input is presented at a frame rate of 100 ms per image, suggesting alignment with intrinsic neural temporal dynamics. Using gradient-based backpropagation through a neural decoding model, we identify distinct EEG channels contributing to position -- and orientation specific -- components, revealing a distributed yet complementary neural encoding scheme. These findings indicate that the brain's spatial systems operate spontaneously and continuously, even under passive conditions, challenging traditional distinctions between active and passive spatial cognition. Our results offer a non-invasive window into the automatic construction of egocentric spatial maps and advance our understanding of how the human mind transforms everyday sensory experience into structured internal representations.

Abstract (translated)

人类拥有非凡的空间认知能力,即使在新的或不熟悉的环境中也能进行自我定位。虽然关于海马体神经元编码位置和方向的研究已经非常详尽,但是支持空间表征的大规模神经动力学,在自然、被动体验期间的表现仍然不清楚。在这项研究中,我们首次展示了基于非侵入式脑机接口(BCI)的电生理技术可以通过脑电图(EEG)解码在观看第一人称视频时自发产生的精细的自身坐标系下的6D姿态信息,包括三维位置和方向。 尽管EEG的空间分辨率有限且信号噪音较高,但我们发现具有空间一致性的视觉输入(即持续而结构化的运动),能够可靠地引发可解码的空间表示,与参与者主观感受到的空间参与感相符合。当以每帧100毫秒的速度呈现视觉输入时,解码性能进一步提升,这表明其与神经元固有的时间动态特性相对齐。 通过基于梯度的反向传播方法来研究神经解码模型,我们识别出了特定于位置和方向的不同EEG通道,揭示了一种分布且互补性的神经编码方案。这些发现表明,大脑的空间系统在被动条件下也能自发、持续地运作,挑战了传统上将主动与被动空间认知区分开来的观点。 我们的结果为探索第一人称视角下的空间地图的自动构建提供了一个非侵入式的窗口,并推动了人类如何将日常感官体验转化为结构化的内部表征的理解。

URL

https://arxiv.org/abs/2507.12417

PDF

https://arxiv.org/pdf/2507.12417.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot