HANDS: A Multimodal Dataset for Modeling Towards Human Grasp Intent Inference in Prosthetic Hands

2021-03-08 15:51:03

Mo Han, Sezen Ya{ğ}mur Günay, Gunar Schirner, Taşkın Padır, Deniz Erdo{ğ}muş

arXiv_AI

arXiv_AI CNN Inference

Abstract
Abstract (translated)
URL
PDF

Abstract

Upper limb and hand functionality is critical to many activities of daily living and the amputation of one can lead to significant functionality loss for individuals. From this perspective, advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user, but more importantly from the improved capability to infer human intent from multimodal sensor data to provide the robotic hand perception abilities regarding the operational context. Such multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors including electromyography and inertial measurement units. A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control. In this paper, we present a dataset of this type that was gathered with the anticipation of cameras being built into prosthetic hands, and computer vision methods will need to assess this hand-view visual evidence in order to estimate human intent. Specifically, paired images from human eye-view and hand-view of various objects placed at different orientations have been captured at the initial state of grasping trials, followed by paired video, EMG and IMU from the arm of the human during a grasp, lift, put-down, and retract style trial structure. For each trial, based on eye-view images of the scene showing the hand and object on a table, multiple humans were asked to sort in decreasing order of preference, five grasp types appropriate for the object in its given configuration relative to the hand. The potential utility of paired eye-view and hand-view images was illustrated by training a convolutional neural network to process hand-view images in order to predict eye-view labels assigned by humans.

Abstract (translated)

URL

https://arxiv.org/abs/2103.04845

PDF

https://arxiv.org/pdf/2103.04845.pdf