Paper Reading AI Learner

Tracking objects that change in appearance with phase synchrony

2024-10-02 23:30:05
Sabine Muzellec, Drew Linsley, Alekh K. Ashok, Ennio Mingolla, Girik Malik, Rufin VanRullen, Thomas Serre

Abstract

Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or movement of nonrigid objects can drastically alter available image features. How do biological visual systems track objects as they change? It may involve specific attentional mechanisms for reasoning about the locations of objects independently of their appearances -- a capability that prominent neuroscientific theories have associated with computing through neural synchrony. We computationally test the hypothesis that the implementation of visual attention through neural synchrony underlies the ability of biological visual systems to track objects that change in appearance over time. We first introduce a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using FeatureTracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved FeatureTracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about.

Abstract (translated)

我们经常遇到的对象在互动过程中会改变外观。光照变化、物体姿态或运动非刚性对象的改变会导致可用图像特征发生极大改变。生物视觉系统如何跟踪随其变化的对象呢?这可能涉及特定注意力机制来独立于物体外观计算物体位置的推理能力——这一能力与通过神经同步计算的神经科学理论密切相关。我们通过计算视觉注意力通过神经同步实现来测试假设,即视觉注意力通过神经同步实现了生物视觉系统在时间上跟踪随其外观变化的对象的能力。 首先,我们介绍了一个新型的深度学习电路,可以通过神经同步准确地控制对特征的关注度,而无需考虑它们在空间中的位置:复杂值循环神经网络(CV-RNN)。接下来,我们使用FeatureTracker这个大型的挑战来比较人类、CV-RNN和其他深度神经网络(DNNs)的物体跟踪能力。尽管人类轻松地解决了FeatureTracker,但最先进的DNNs没有做到。相反,我们的CV-RNN在挑战中表现出了与人类相似的行为,提供了计算同步作为神经基因为追踪随其运动变化的外貌变形的物体的证明。

URL

https://arxiv.org/abs/2410.02094

PDF

https://arxiv.org/pdf/2410.02094.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot