Paper Reading AI Learner

Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control

2025-07-22 14:42:49
Zongzheng Zhang, Jiawen Yang, Ziqiao Peng, Meng Yang, Jianzhu Ma, Lin Cheng, Huazhe Xu, Hang Zhao, Hao Zhao

Abstract

Previous animatronic faces struggle to express emotions effectively due to hardware and software limitations. On the hardware side, earlier approaches either use rigid-driven mechanisms, which provide precise control but are difficult to design within constrained spaces, or tendon-driven mechanisms, which are more space-efficient but challenging to control. In contrast, we propose a hybrid actuation approach that combines the best of both worlds. The eyes and mouth-key areas for emotional expression-are controlled using rigid mechanisms for precise movement, while the nose and cheek, which convey subtle facial microexpressions, are driven by strings. This design allows us to build a compact yet versatile hardware platform capable of expressing a wide range of emotions. On the algorithmic side, our method introduces a self-modeling network that maps motor actions to facial landmarks, allowing us to automatically establish the relationship between blendshape coefficients for different facial expressions and the corresponding motor control signals through gradient backpropagation. We then train a neural network to map speech input to corresponding blendshape controls. With our method, we can generate distinct emotional expressions such as happiness, fear, disgust, and anger, from any given sentence, each with nuanced, emotion-specific control signals-a feature that has not been demonstrated in earlier systems. We release the hardware design and code at this https URL and this https URL.

Abstract (translated)

之前的人形机器人面部在表达情感时由于硬件和软件的限制而显得不够有效。从硬件角度来看,早期的方法要么使用刚性驱动机制,这种方法能够提供精确控制但难以在有限的空间内设计;要么使用肌腱驱动机制,这种方式虽然更节省空间但控制起来更加困难。相比之下,我们提出了一种混合致动方法,结合了两种方式的优点。眼睛和嘴巴是情感表达的关键部位,这些区域采用刚性机制进行精细运动控制,而鼻子和脸颊则通过线缆驱动,以传达微妙的面部微表情。这种设计使我们可以构建一个既紧凑又多功能的硬件平台,能够表达广泛的情感。 在算法方面,我们的方法引入了一个自建模网络,该网络将电机动作映射到面部特征点上,并允许我们自动建立不同面部表情的变形形状系数与相应的电机控制信号之间的关系通过反向传播梯度实现。接着训练一个神经网络以从语音输入中映射出对应的变形形状控制器。借助我们的方法,可以从任何给定句子生成如快乐、恐惧、厌恶和愤怒等独特的表情,每个情感都具有细微且特定的情绪化控制信号——这是之前系统未曾展示过的特性。 我们将硬件设计和代码发布在这个链接:[此URL] 和这个链接:[此URL](请将“https URL”替换为实际的链接地址)。

URL

https://arxiv.org/abs/2507.16645

PDF

https://arxiv.org/pdf/2507.16645.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot