Abstract
Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity movement. It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment. However, the identification, differentiation, and understanding of micro-actions pose challenges due to the imperceptible and inaccessible nature of these subtle human behaviors in everyday life. In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52), and propose a benchmark named micro-action network (MANet) for micro-action recognition (MAR) task. Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements, attempting to reveal comprehensive micro-action cues. In detail, MA-52 contains 52 micro-action categories along with seven body part labels, and encompasses a full array of realistic and natural micro-actions, accounting for 205 participants and 22,422 video instances collated from the psychological interviews. Based on the proposed dataset, we assess MANet and other nine prevalent action recognition methods. MANet incorporates squeeze-and excitation (SE) and temporal shift module (TSM) into the ResNet architecture for modeling the spatiotemporal characteristics of micro-actions. Then a joint-embedding loss is designed for semantic matching between video and action labels; the loss is used to better distinguish between visually similar yet distinct micro-action categories. The extended application in emotion recognition has demonstrated one of the important values of our proposed dataset and method. In the future, further exploration of human behaviour, emotion, and psychological assessment will be conducted in depth. The dataset and source code are released at this https URL.
Abstract (translated)
微动作是一种难以察觉的非语言行为,其特征是低强度运动。它揭示了个体情感和意图,对于诸如情感识别和心理评估等以人为中心的应用具有重要意义。然而,由于这些微行为在日常生活中的难以察觉和无法访问性,识别、区分和理解微动作带来了挑战。在这项研究中,我们创新性地收集了一个名为MA-52的新微动作数据集,并提出了名为微动作网络(MANet)的基准用于微动作识别(MAR)任务。与其他方法不同,MA-52提供了全身视角,包括手势、上半身和下半身运动,试图揭示全面的微动作线索。具体来说,MA-52包含了52个微动作类别以及7个身体部位标签,涵盖了205个参与者以及从心理访谈中收集的22,422个视频实例。基于所提出的数据集,我们评估了MANet和其他9种普遍的动作识别方法。MANet将挤压和兴奋(SE)以及时钟转移模块(TSM)融入ResNet架构,以建模微动作的时空特征。然后,为视频和动作标签之间的语义匹配设计了联合嵌入损失;该损失被用于更好地区分视觉上相似但具有区别的微动作类别。在情感识别的扩展应用中,我们发现我们提出的数据和方法的一个关键价值。在未来的研究中,将深入探讨人类行为、情感和心理评估。数据和源代码发布在https://www. thisurl。
URL
https://arxiv.org/abs/2403.05234