Paper Reading AI Learner

Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System

2019-05-05 09:53:29
Minjie Hua, Fuyuan Shi, Yibing Nan, Kai Wang, Hao Chen, Shiguo Lian

Abstract

This paper presents a novel method to improve the conversational interaction abilities of intelligent robots to enable more realistic body gestures. The sequence-to-sequence (seq2seq) model is adapted for synthesizing the robots' body gestures represented by the movements of twelve upper-body keypoints in not only the speaking phase, but also the listening phase for which previous methods can hardly achieve. We collected and preprocessed substantial videos of human conversation from Youtube to train our seq2seq-based models and evaluated them by the mean squared error (MSE) and cosine similarity on the test set. The tuned models were implemented to drive a virtual avatar as well as a physical humanoid robot, to demonstrate the improvement on interaction abilities of our method in practice. With body gestures synthesized by our models, the avatar and Pepper exhibited more intelligently while communicating with humans.

Abstract (translated)

本文提出了一种新的提高智能机器人会话交互能力的方法,使其具有更逼真的肢体姿势。采用顺序-顺序(seq2seq)模型,综合了12个上身关键点在说话阶段和听音阶段的运动所代表的机器人的身体姿势,而以前的方法都很难实现。我们从YouTube上收集并预处理了大量的人类对话视频,以训练我们基于seq2seq的模型,并通过测试集上的均方误差(mse)和余弦相似性对其进行评估。利用调整后的模型驱动虚拟虚拟虚拟人物和实物仿人机器人,在实践中验证了该方法的交互能力的提高。通过我们的模型合成的身体姿势,阿凡达和胡椒在与人类交流时表现得更加聪明。

URL

https://arxiv.org/abs/1905.01641

PDF

https://arxiv.org/pdf/1905.01641.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot