Abstract
The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.
Abstract (translated)
基础模型模型的出现标志着人工智能进入了一个新的时代。SAM 是图像分割的基础模型中的第一个。在这项研究中,我们评估了 SMA 对虚拟现实设置中记录的眼部图像特征进行分割的能力。对标注眼部图像数据集的需求不断增加,这为SMA重新定义数据注释格局提供了巨大的机会。我们的研究集中在SMA的零样本学习能力和提示(如边界框或点击)的有效性上。我们的结果与其他领域的研究结果一致,证明了SMA的分割效果可以与根据特定功能进行专门建模的模型相媲美,而提示可以提高其性能,这可以通过在一个数据集上的瞳孔分割的IoU为93.34%来说明。基于SMA的基础模型可以凭借快速且易於理解的图像分割,减少对专门模型的依赖,以及减少大量手动注释,从而颠覆凝视估计。
URL
https://arxiv.org/abs/2311.08077