Paper Reading AI Learner

TranSTYLer: Multimodal Behavioral Style Transfer for Facial and Body Gestures Generation

2023-08-08 15:42:35
Mireille Fares, Catherine Pelachaud, Nicolas Obin

Abstract

This paper addresses the challenge of transferring the behavior expressivity style of a virtual agent to another one while preserving behaviors shape as they carry communicative meaning. Behavior expressivity style is viewed here as the qualitative properties of behaviors. We propose TranSTYLer, a multimodal transformer based model that synthesizes the multimodal behaviors of a source speaker with the style of a target speaker. We assume that behavior expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style and content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source behaviors. Our approach eliminates the need for style labels and allows the generalization to styles that have not been seen during the training phase. We train our model on the PATS corpus, which we extended to include dialog acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state of the art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behavior and gestures associated with the target style are successfully transferred, while ensuring the preservation of the ones related to the source content.

Abstract (translated)

本论文探讨了如何将虚拟代理的行为表达风格转移到另一个代理上,同时保留其行为形状,因为它们传达了通信意义。行为表达风格被视为行为的质量属性。我们提出了 TranSTYLer 多模式Transformer模型,该模型将多模式行为从一个源演讲者到一个目标演讲者的多模式行为合成起来。我们认为行为表达风格在通信的各种模式中编码,包括文本、演讲、身体手势和面部表情。模型使用风格和内容分离框架以确保转移风格不会干扰源行为传达的意义。我们的方法不需要风格标签,并允许在训练阶段推广到未曾展示的风格。我们在 PATS corpus 上训练我们的模型,该 corpus 扩展了对话行为和 2D 面部地标。客观和主观评估表明,我们在训练期间可见和未可见风格的行为表达和手势表现上,我们的模型比最先进的模型表现更好。为了应对可能出现的风格和内容泄漏问题,我们提出了一种方法,评估目标风格相关的和行为和手势成功地转移的程度,同时确保保留与源内容相关的行为和手势。

URL

https://arxiv.org/abs/2308.10843

PDF

https://arxiv.org/pdf/2308.10843.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot