Abstract
This paper presents a novel method to improve the conversational interaction abilities of intelligent robots to enable more realistic body gestures. The sequence-to-sequence (seq2seq) model is adapted for synthesizing the robots' body gestures represented by the movements of twelve upper-body keypoints in not only the speaking phase, but also the listening phase for which previous methods can hardly achieve. We collected and preprocessed substantial videos of human conversation from Youtube to train our seq2seq-based models and evaluated them by the mean squared error (MSE) and cosine similarity on the test set. The tuned models were implemented to drive a virtual avatar as well as a physical humanoid robot, to demonstrate the improvement on interaction abilities of our method in practice. With body gestures synthesized by our models, the avatar and Pepper exhibited more intelligently while communicating with humans.
Abstract (translated)
本文提出了一种新的提高智能机器人会话交互能力的方法,使其具有更逼真的肢体姿势。采用顺序-顺序(seq2seq)模型,综合了12个上身关键点在说话阶段和听音阶段的运动所代表的机器人的身体姿势,而以前的方法都很难实现。我们从YouTube上收集并预处理了大量的人类对话视频,以训练我们基于seq2seq的模型,并通过测试集上的均方误差(mse)和余弦相似性对其进行评估。利用调整后的模型驱动虚拟虚拟虚拟人物和实物仿人机器人,在实践中验证了该方法的交互能力的提高。通过我们的模型合成的身体姿势,阿凡达和胡椒在与人类交流时表现得更加聪明。
URL
https://arxiv.org/abs/1905.01641