Paper Reading AI Learner

A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques

2025-12-09 11:50:44
Lownish Rai Sookha, Nikhil Pakhale, Mudasir Ganaie, Abhinav Dhall

Abstract

Body and face motion play an integral role in communication. They convey crucial information on the participants. Advances in generative modeling and multi-modal learning have enabled motion generation from signals such as speech, conversational context and visual cues. However, generating expressive and coherent face and body dynamics remains challenging due to the complex interplay of verbal / non-verbal cues and individual personality traits. This survey reviews body and face motion generation, covering core concepts, representations techniques, generative approaches, datasets and evaluation metrics. We highlight future directions to enhance the realism, coherence and expressiveness of avatars in dyadic settings. To the best of our knowledge, this work is the first comprehensive review to cover both body and face motion. Detailed resources are listed on this https URL.

Abstract (translated)

身体和面部动作在沟通中扮演着核心角色,它们传达了关于参与者的重要信息。生成式模型及多模态学习的进步使得可以通过诸如语音、对话背景以及视觉线索等信号来生成运动。然而,由于言语/非言语提示和个人性格特征之间的复杂相互作用,仍然很难生成具有表现力且连贯的面部和身体动态。本文综述了身体和面部动作生成的相关研究,涵盖了核心概念、表示技术、生成方法、数据集及评估指标。我们强调了未来的研究方向,旨在增强二元情境下化身的真实感、连贯性和表达性。据我们所知,这是第一份全面回顾同时涵盖身体和面部运动的研究工作。 详细的资源列于 [提供的URL]。

URL

https://arxiv.org/abs/2512.09005

PDF

https://arxiv.org/pdf/2512.09005.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot