Paper Reading AI Learner

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

2024-04-13 00:13:20
Yan Ru Pei, Sasskia Br\"uers, S\'ebastien Crouzet, Douglas McLelland, Olivier Coenen

Abstract

Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.

Abstract (translated)

基于事件的数据显示在需要高效率和低延迟的边缘计算环境中非常常见。为了与这样的数据进行交互并充分利用其丰富的时序特征,我们提出了一个因果时序卷积网络。这个解决方案通过以下三种方式针对边缘适配硬件资源有限的问题:1)故意选择简单的架构和操作(卷积,ReLU激活);2)可以通过层输出缓冲进行在线推理的高效配置;3)在训练过程中通过正则化实现超过90%的激活稀疏度,从而在事件处理芯片上实现显著的高效提升。此外,我们提出了一种直接对事件进行加权的增强策略,缓解了事件基于系统数据量少的問題。我们在AIS 2024基于事件的眼动跟踪挑战中应用我们的模型,在Kaggle私有测试集上的分数达到0.9916 p10。

URL

https://arxiv.org/abs/2404.08858

PDF

https://arxiv.org/pdf/2404.08858.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot