Abstract
We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. This research direction is essential to make a machine that genuinely communicates with humans, which we call Social Artificial Intelligence. We first formulate the "social signal prediction" problem as a way to model the dynamics of social signals exchanged among interacting individuals in a data-driven way. We then present a new 3D motion capture dataset to explore this problem, where the broad spectrum of social signals (3D body, face, and hand motions) are captured in a triadic social interaction scenario. Baseline approaches to predict speaking status, social formation, and body gestures of interacting individuals are presented in the defined social prediction framework.
Abstract (translated)
我们提出了一个新的研究任务和一个数据集,通过计算方法来理解人类社会交互,最终赋予机器编码和解码人类使用的广泛社会信号通道的能力。这一研究方向对于制造一种真正与人类交流的机器至关重要,我们称之为社会人工智能。我们首先将“社会信号预测”问题描述为一种以数据驱动方式模拟交互个体之间社会信号交换动态的方法。然后,我们提出了一个新的三维运动捕捉数据集来探索这个问题,在这个问题中,在三元社会互动场景中捕捉到了广泛的社会信号(三维身体、面部和手的运动)。在定义的社会预测框架中,提出了预测互动个体的言语状态、社会形态和身体姿势的基线方法。
URL
https://arxiv.org/abs/1906.04158