Abstract
Human eye gaze estimation is an important cognitive ingredient for successful human-robot interaction, enabling the robot to read and predict human behavior. We approach this problem using artificial neural networks and build a modular system estimating gaze from separately cropped eyes, taking advantage of existing well-functioning components for face detection (RetinaFace) and head pose estimation (6DRepNet). Our proposed method does not require any special hardware or infrared filters but uses a standard notebook-builtin RGB camera, as often approached with appearance-based methods. Using the MetaHuman tool, we also generated a large synthetic dataset of more than 57,000 human faces and made it publicly available. The inclusion of this dataset (with eye gaze and head pose information) on top of the standard Columbia Gaze dataset into training the model led to better accuracy with a mean average error below two degrees in eye pitch and yaw directions, which compares favourably to related methods. We also verified the feasibility of our model by its preliminary testing in real-world setting using the builtin 4K camera in NICO semi-humanoid robot's eye.
Abstract (translated)
人类眼睛注视估计是成功的人机交互的重要认知成分,使机器人能够阅读和预测人类行为。我们通过人工神经网络解决这个问题,并建立了一个模块系统,从分别裁剪的眼睛中估计注视,利用现有的面部检测(RetinaFace)和头姿态估计(6DRepNet)的成熟组件。我们提出的方法不需要特殊的硬件或红外滤镜,而是利用了一个标准的笔记本内置的RGB相机,通常与基于外观的方法相同。使用元人类工具,我们还生成了超过57,000个合成面部数据集,并将其公开发布。在将这个数据集(带有眼部和头姿态信息)放在标准的哥伦比亚 gaze数据集中训练模型后,我们在眼俯仰和眼偏转方向上的平均误差低于两度,与相关方法相比具有优势。我们还通过使用NICO半人形机器人预先测试模型中的内置4K相机来验证我们模型的可行性。
URL
https://arxiv.org/abs/2311.14175