Paper Reading AI Learner

Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences

2024-05-08 09:13:10
Cheng Song, Lu Lu, Zhen Ke, Long Gao, Shuai Ding

Abstract

Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. However, due to the limited diversity of gaits and the incompleteness of feature representations for skeletons, the existing contrastive learning methods are usually inefficient for the acquisition of gait emotions. In this paper, we propose a contrastive learning framework utilizing selective strong augmentation (SSA) for self-supervised gait-based emotion representation, which aims to derive effective representations from limited labeled gait data. First, we propose an SSA method for the gait emotion recognition task, which includes upper body jitter and random spatiotemporal mask. The goal of SSA is to generate more diverse and targeted positive samples and prompt the model to learn more distinctive and robust feature representations. Then, we design a complementary feature fusion network (CFFN) that facilitates the integration of cross-domain information to acquire topological structural and global adaptive features. Finally, we implement the distributional divergence minimization loss to supervise the representation learning of the generally and strongly augmented queries. Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.

Abstract (translated)

情感识别是情感计算的重要组成部分。从人类脚步中提取情感线索带来了诸如自然互动、非侵入性、远程检测等好处。最近,自监督学习技术的发展为基于脚步情感识别领域缺乏标注数据的问题提供了一个实际解决方案。然而,由于脚步动作的多样性有限和骨骼特征表示的不完整性,现有的对比学习方法通常对于获取有限标注数据的脚步情感识别效果不佳。在本文中,我们提出了一种使用选择性 strong augmentation (SSA) 的对比学习框架,用于自监督基于脚步情感表示,旨在从有限的标注数据中提取有效的情感表示。首先,我们提出了一种 SSA 方法来处理脚步情感识别任务,包括上半身抖动和随机时空掩码。SSA 的目标是生成更多多样化和针对性的正样本,并促使模型学习更具有特色和鲁棒性的特征表示。然后,我们设计了一个互补特征融合网络(CFFN),促进跨领域信息的整合以获取拓扑结构和全局自适应特征。最后,我们实现了一种分布差异最小化损失,用于指导一般和强烈 augmented 查询的特征学习。我们的方法在 Emotion-Gait 和 Emilya 数据集上的验证结果表明,它在不同评估协议下优于最先进的methods。

URL

https://arxiv.org/abs/2405.04900

PDF

https://arxiv.org/pdf/2405.04900.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot