Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Abstract
Abstract (translated)
URL
PDF

Abstract

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at this https URL.

Abstract (translated)

眼动技术是许多消费电子产品（尤其是虚拟和增强现实）的重要组成部分。这些应用需要具备三个关键方面的解决方案：低延迟、低功耗和高精度。然而，在所有这些方面实现最佳性能仍然是一个具有挑战性的任务，需要平衡复杂的算法和高效的后台硬件实现之间的权衡。在这项研究中，我们通过与事件相机协同设计的系统来应对这个挑战。我们利用事件数据固有的稀疏性，为子manifold稀疏卷积神经网络（SCNN）实现了一个新的稀疏FPGA数据流加速器。该加速器对每个事件切片的表现进行处理，仅在激活非零的情况下处理。然后，这些向量通过门控循环单元（GRU）和主机CPU上的全连接层进行进一步处理，生成眼心。部署和评估我们的系统揭示了出色的性能指标。在基于事件的Eye-Tracking-AIS2024数据集上，我们的系统实现81%的p5准确率、99.5%的p10准确率和3.71Mean Euclidean Distance，具有0.7ms的延迟，而仅消耗2.29mJ的推理。值得注意的是，我们的解决方案为未来的眼动系统提供了可能。代码可以从该链接的URL中获取。

URL

https://arxiv.org/abs/2404.14279

PDF

https://arxiv.org/pdf/2404.14279.pdf

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Abstract

Abstract (translated)

URL

PDF Copy

PDF