Abstract
Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.
Abstract (translated)
情感触觉互动构成了人类交流的基本组成部分。在自然的人际交往中,触摸很少是孤立体验的;相反,它本质上是一种多感官体验。个体不仅感知到接触带来的物理感觉,还会注意到由此产生的伴随声音线索。触觉和听觉信息的整合形成了一种丰富而细腻的情感表达通道。 虽然已有大量研究探讨了机器人如何通过面部表情和言语传达情感,但它们通过触摸来传递社交手势和情绪的能力仍然很大程度上未被探索。为填补这一空白,我们开发了一个多模态交互系统,该系统包括一个由25个振动电机组成的5x5网格,并与音频回放同步工作,使机器人能够提供结合了触觉和听觉刺激的输出。 在一项涉及32名中国参与者的实验中,十种情感和六种社交手势通过振动、声音或二者的组合呈现出来。参与者对每种刺激根据唤醒度和效价量表进行了评分。结果表明: 1. 与单一模态相比,触觉-听觉多模态结合显著提高了解码准确性。 2. 每个单独的通道——振动或声音——有效地支持了某些情感识别,并且不同的情感表达具有各自的优势。 3. 单独的手势通常不足以传达清晰可辨识的情绪。 这些发现强调了在情感人机交互中多感官整合的重要性,同时也突显了触觉和听觉线索在增强情绪交流中的互补作用。
URL
https://arxiv.org/abs/2508.07839