How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

2020-08-18 20:22:16

Amanda Duarte, Shruti Palaskar, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giro-i-Nieto

arXiv_CV

arXiv_CV Recognition Pose_Estimation Pose 3D Reconstruction Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Sign Language is the primary means of communication for the majority of the Deaf community. One of the factors that has hindered the progress in the areas of automatic sign language recognition, generation, and translation is the absence of large annotated datasets, especially continuous sign language datasets, i.e. datasets that are annotated and segmented at the sentence or utterance level. Towards this end, in this work we introduce How2Sign, a work-in-progress dataset collection. How2Sign consists of a parallel corpus of 80 hours of sign language videos (collected with multi-view RGB and depth sensor data) with corresponding speech transcriptions and gloss annotations. In addition, a three-hour subset was further recorded in a geodesic dome setup using hundreds of cameras and sensors, which enables detailed 3D reconstruction and pose estimation and paves the way for vision systems to understand the 3D geometry of sign language.

Abstract (translated)

URL

https://arxiv.org/abs/2008.08143

PDF

https://arxiv.org/pdf/2008.08143.pdf