Paper Reading AI Learner

Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in A Triadic Interaction

2019-06-10 17:56:03
Hanbyul Joo, Tomas Simon, Mina Cikara, Yaser Sheikh

Abstract

We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. This research direction is essential to make a machine that genuinely communicates with humans, which we call Social Artificial Intelligence. We first formulate the "social signal prediction" problem as a way to model the dynamics of social signals exchanged among interacting individuals in a data-driven way. We then present a new 3D motion capture dataset to explore this problem, where the broad spectrum of social signals (3D body, face, and hand motions) are captured in a triadic social interaction scenario. Baseline approaches to predict speaking status, social formation, and body gestures of interacting individuals are presented in the defined social prediction framework.

Abstract (translated)

我们提出了一个新的研究任务和一个数据集,通过计算方法来理解人类社会交互,最终赋予机器编码和解码人类使用的广泛社会信号通道的能力。这一研究方向对于制造一种真正与人类交流的机器至关重要,我们称之为社会人工智能。我们首先将“社会信号预测”问题描述为一种以数据驱动方式模拟交互个体之间社会信号交换动态的方法。然后,我们提出了一个新的三维运动捕捉数据集来探索这个问题,在这个问题中,在三元社会互动场景中捕捉到了广泛的社会信号(三维身体、面部和手的运动)。在定义的社会预测框架中,提出了预测互动个体的言语状态、社会形态和身体姿势的基线方法。

URL

https://arxiv.org/abs/1906.04158

PDF

https://arxiv.org/pdf/1906.04158.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot