Paper Reading AI Learner

Touch Speaks, Sound Feels: A Multimodal Approach to Affective and Social Touch from Robots to Humans

2025-08-11 10:45:43
Qiaoqiao Ren, Tony Belpaeme

Abstract

Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.

Abstract (translated)

情感触觉互动构成了人类交流的基本组成部分。在自然的人际交往中,触摸很少是孤立体验的;相反,它本质上是一种多感官体验。个体不仅感知到接触带来的物理感觉,还会注意到由此产生的伴随声音线索。触觉和听觉信息的整合形成了一种丰富而细腻的情感表达通道。 虽然已有大量研究探讨了机器人如何通过面部表情和言语传达情感,但它们通过触摸来传递社交手势和情绪的能力仍然很大程度上未被探索。为填补这一空白,我们开发了一个多模态交互系统,该系统包括一个由25个振动电机组成的5x5网格,并与音频回放同步工作,使机器人能够提供结合了触觉和听觉刺激的输出。 在一项涉及32名中国参与者的实验中,十种情感和六种社交手势通过振动、声音或二者的组合呈现出来。参与者对每种刺激根据唤醒度和效价量表进行了评分。结果表明: 1. 与单一模态相比,触觉-听觉多模态结合显著提高了解码准确性。 2. 每个单独的通道——振动或声音——有效地支持了某些情感识别,并且不同的情感表达具有各自的优势。 3. 单独的手势通常不足以传达清晰可辨识的情绪。 这些发现强调了在情感人机交互中多感官整合的重要性,同时也突显了触觉和听觉线索在增强情绪交流中的互补作用。

URL

https://arxiv.org/abs/2508.07839

PDF

https://arxiv.org/pdf/2508.07839.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot