Emotion

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

2024-04-25 15:15:36

Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

arXiv_AI

arXiv_AI Recognition Classification Language_Model Transformer Pose Zero-Shot Emotion Chat
Abstract

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to instructions related to emotional contexts. Initially, we identify key visual clues critical to visual emotion recognition. Subsequently, we introduce a novel GPT-assisted pipeline for generating emotion visual instruction data, effectively addressing the scarcity of annotated instruction data in this domain. Expanding on the groundwork established by InstructBLIP, our proposed EmoVIT architecture incorporates emotion-specific instruction data, leveraging the powerful capabilities of Large Language Models to enhance performance. Through extensive experiments, our model showcases its proficiency in emotion classification, adeptness in affective reasoning, and competence in comprehending humor. The comparative analysis provides a robust benchmark for Emotion Visual Instruction Tuning in the era of LLMs, providing valuable insights and opening avenues for future exploration in this domain. Our code is available at \url{this https URL}.

Abstract (translated)

视觉指令微调是一种新的学习范式，涉及使用任务特定指令对预训练语言模型进行微调。在这个范式中，我们专注于提高模型在理解并遵循与情感上下文相关的指令方面的能力。首先，我们识别出对视觉情感识别至关重要的关键视觉线索。接着，我们引入了一种新颖的GPT辅助生成情感视觉指令数据的长式依赖关系网络，有效解决了该领域中标注指令数据不足的问题。通过在InstructionBLIP工作的基础上拓展工作，我们提出的EmoVIT架构利用大型语言模型的强大能力来增强性能。通过广泛的实验，我们的模型在情感分类、情感推理和理解幽默方面展现了卓越的表现。比较分析为LLM时代的情感视觉指令微调提供了一个稳健的基准，为这个领域提供了宝贵的见解，并开拓了未来的研究方向。我们的代码可在此处访问：\url{这个链接}。

URL

https://arxiv.org/abs/2404.16670

PDF

https://arxiv.org/pdf/2404.16670.pdf
Read All
Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognition

2024-04-24 08:07:16

Shu Liu, Yan Xu, Tongming Wan, Xiaoyan Kui

arXiv_CV

arXiv_CV Recognition Attention Emotion
Abstract

Facial expression recognition (FER) plays a significant role in our daily life. However, annotation ambiguity in the datasets could greatly hinder the performance. In this paper, we address FER task via label distribution learning paradigm, and develop a dual-branch Adaptive Distribution Fusion (Ada-DF) framework. One auxiliary branch is constructed to obtain the label distributions of samples. The class distributions of emotions are then computed through the label distributions of each emotion. Finally, those two distributions are adaptively fused according to the attention weights to train the target branch. Extensive experiments are conducted on three real-world datasets, RAF-DB, AffectNet and SFEW, where our Ada-DF shows advantages over the state-of-the-art works.

Abstract (translated)

面部表情识别（FER）在我们的日常生活中扮演着重要的角色。然而，数据集中注释的不明确性可能极大地阻碍了性能。在本文中，我们通过标签分布学习范式来解决FER任务，并开发了一个双分支自适应分布融合（Ada-DF）框架。一个辅助分支被构建来获得样本的标签分布。然后，通过每个情感的标签分布计算情感的类别分布。最后，根据注意权重动态地将这两个分布进行自适应融合，以训练目标分支。我们在三个真实世界数据集（RAF-DB，AffectNet和SFEW）上进行了广泛的实验，结果表明，与最先进的成果相比，我们的Ada-DF具有优势。

URL

https://arxiv.org/abs/2404.15714

PDF

https://arxiv.org/pdf/2404.15714.pdf
Read All
GLoD: Composing Global Contexts and Local Details in Image Generation

2024-04-23 18:39:57

Moyuru Yamada

arXiv_CV

arXiv_CV Quantitative Pose Denoising Action Emotion Diffusion
Abstract

Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g., colors and emotions) still remains a significant challenge. The models often fail to understand complex descriptions involving multiple objects and reflect specified visual attributes to wrong targets or ignore them. This paper presents Global-Local Diffusion (\textit{GLoD}), a novel framework which allows simultaneous control over the global contexts and the local details in text-to-image generation without requiring training or fine-tuning. It assigns multiple global and local prompts to corresponding layers and composes their noises to guide a denoising process using pre-trained diffusion models. Our framework enables complex global-local compositions, conditioning objects in the global prompt with the local prompts while preserving other unspecified identities. Our quantitative and qualitative evaluations demonstrate that GLoD effectively generates complex images that adhere to both user-provided object interactions and object details.

Abstract (translated)

扩散模型已经证明了它们从文本提示中合成高质量和多样图像的能力。然而，同时控制全局上下文（例如物体布局和交互）和局部细节（例如颜色和情感）仍然是一个重要的挑战。模型通常无法理解涉及多个物体的复杂描述，并将指定的视觉属性错误地应用于错误的目标或忽略它们。本文提出了一种名为全局-局部扩散（GLoD）的新框架，允许在文本到图像生成中同时控制全局上下文和局部细节，而无需进行训练或微调。它将多个全局和局部提示分配给相应的层，并将它们的噪声组合起来，使用预训练的扩散模型进行去噪处理。我们的框架能够实现复杂的全局-局部组合，通过局部提示保留全局提示，同时保留其他未指定身份的物体。我们的定量和定性评估显示，GLoD有效地生成了符合用户提供的物体交互和物体细节的复杂图像。

URL

https://arxiv.org/abs/2404.15447

PDF

https://arxiv.org/pdf/2404.15447.pdf
Read All
Identifying Fairness Issues in Automatically Generated Testing Content

2024-04-23 14:56:15

Kevin Stowe, Benny Longwill, Alyssa Francis, Tatsuya Aoyama, Debanjan Ghosh, Swapna Somasundaran

arXiv_CL

arXiv_CL Classification Language_Model Bert Few-Shot Emotion
Abstract

Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we identify test content that is focused on particular domains and experiences that only reflect a certain demographic or that are potentially emotionally upsetting; both of which could inadvertently impact a test-taker's score. This kind of content doesn't reflect typical biases out of context, making it challenging even for modern models that contain safeguards. We build a dataset of 621 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of .791 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.

Abstract (translated)

自然语言生成工具对于生成内容非常强大和有效。然而，语言模型已经被证明存在偏见和不公平问题，这使得它们在许多用例上部署不实用。在这里，我们关注公平性问题如何影响自动生成的测试内容，这些内容可能对测试者得分产生严格的要求，以确保测试只衡量了它本应测量的内容。具体来说，我们识别出关注特定领域和经验的测试内容，这可能只反映了某些人口统计学或可能引起情感不安的内容；这两者都可能无意中影响测试者的得分。这类内容不反映上下文的典型偏见，这使得现代模型（包含安全措施）更难以处理。我们建立了一个为公平性 annotated的621个生成的文本的数据集，并探讨了分类的方法：微调、基于主题的分类和提示，包括少样本和自纠正提示。我们发现，结合自纠正提示和少样本学习效果最好，在 hold-out 测试集上的 F1 分数为.791，而BERT 和基于主题的模型在离域数据上的竞争性能较小。

URL

https://arxiv.org/abs/2404.15104

PDF

https://arxiv.org/pdf/2404.15104.pdf
Read All
CAGE: Circumplex Affect Guided Expression Inference

2024-04-23 12:30:17

Niklas Wagner, Felix Mätzler, Samed R. Vossberg, Helen Schneider, Svetlana Pavlitska, J. Marius Zöllner

arXiv_CV

arXiv_CV Inference Prediction Pose Emotion
Abstract

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: this https URL.

Abstract (translated)

理解情感和表达是一个跨越多个学科的任务，尤其是在提高用户体验方面。与普遍认识相反，已经证明情感并不是离散的实体，而是存在于一个连续的过程中。由于各种因素（包括文化背景、个人经历和认知偏见）的不同，人们对离散情感的理解存在差异。因此，大多数表达理解方法，尤其是那些依赖离散类别的，在本质上存在偏见。在本文中，我们对两个常见的数据集（AffectNet和EMOTIC）进行了比较深入的分析和评估，这些数据集配备了共轭模型的组件。此外，我们提出了一个专为轻量级应用设计的面部表情预测模型。通过基于小规模的MaxViT模型架构，我们在训练过程中使用连续的紧张和兴奋标签对离散表达类别标签的影响进行了评估。我们发现，在考虑紧张和兴奋标签的同时，使用离散类别标签可以显著提高表情推断。所提出的模型在AffectNet上优于现有状态，将其确立为推断紧张和兴奋的最佳模型，具有7%的较低MSE。训练脚本和训练权重以复制我们的结果可以从这里找到：https://www. this URL。

URL

https://arxiv.org/abs/2404.14975

PDF

https://arxiv.org/pdf/2404.14975.pdf
Read All
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

2024-04-22 05:12:52

Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

arXiv_AI

arXiv_AI Review Survey Recommendation Language_Model Pose Emotion
Abstract

Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison between their attitudes and support for each goal and those of humans. We examine the potential disparities, primarily focusing on aspects such as understanding and emotions, cultural and regional differences, task objective variations, and factors considered in the decision-making process. These disparities arise from the underrepresentation and imbalance in LLM training data, historical biases, quality issues, lack of contextual understanding, and skewed ethical values reflected. The study also investigates the risks and harms that may arise from neglecting the attitudes of LLMs towards the SDGs, including the exacerbation of social inequalities, racial discrimination, environmental destruction, and resource wastage. To address these challenges, we propose strategies and recommendations to guide and regulate the application of LLMs, ensuring their alignment with the principles and goals of the SDGs, and therefore creating a more just, inclusive, and sustainable future.

Abstract (translated)

大语言模型（LLMs）已成为促进联合国可持续发展目标（SDGs）的有力工具。然而，LLMs和人类之间针对这些目标的態度差异可能会带来重大挑战。这项研究对现有文献进行了全面回顾和分析，重点关注LLMs对17个SDGs的態度，强调它们的态度和支持与人类的相比。我们检查了可能存在的差异，主要关注理解与情感、文化地区差异、任务目标变化和决策过程因素等方面。这些差异源于LLM训练数据的不足和失衡，历史偏见，质量问题，缺乏语境理解，以及反映伦理价值观的失衡。研究还探讨了忽视LLMs对SDGs的態度可能产生的风险和危害，包括加剧社会不平等、种族歧视、环境破坏和资源浪费。为了应对这些挑战，我们提出了指导和管理LLM使用的策略和建议，确保其与SDGs的原则和目标保持一致，从而为创造一个更加公正、包容和可持续的未来做出贡献。

URL

https://arxiv.org/abs/2404.13885

PDF

https://arxiv.org/pdf/2404.13885.pdf
Read All
Multi-channel Emotion Analysis for Consensus Reaching in Group Movie Recommendation Systems

2024-04-21 21:19:31

Adilet Yerkin, Elnara Kadyrgali, Yerdauit Torekhan, Pakizar Shamoi

arXiv_AI

arXiv_AI Detection Survey Recommendation Relation Inference Pose Emotion
Abstract

Watching movies is one of the social activities typically done in groups. Emotion is the most vital factor that affects movie viewers' preferences. So, the emotional aspect of the movie needs to be determined and analyzed for further recommendations. It can be challenging to choose a movie that appeals to the emotions of a diverse group. Reaching an agreement for a group can be difficult due to the various genres and choices. This paper proposes a novel approach to group movie suggestions by examining emotions from three different channels: movie descriptions (text), soundtracks (audio), and posters (image). We employ the Jaccard similarity index to match each participant's emotional preferences to prospective movie choices, followed by a fuzzy inference technique to determine group consensus. We use a weighted integration process for the fusion of emotion scores from diverse data types. Then, group movie recommendation is based on prevailing emotions and viewers' best-loved movies. After determining the recommendations, the group's consensus level is calculated using a fuzzy inference system, taking participants' feedback as input. Participants (n=130) in the survey were provided with different emotion categories and asked to select the emotions best suited for particular movies (n=12). Comparison results between predicted and actual scores demonstrate the efficiency of using emotion detection for this problem (Jaccard similarity index = 0.76). We explored the relationship between induced emotions and movie popularity as an additional experiment, analyzing emotion distribution in 100 popular movies from the TMDB database. Such systems can potentially improve the accuracy of movie recommendation systems and achieve a high level of consensus among participants with diverse preferences.

Abstract (translated)

观看电影是人们在集体活动中通常会进行的一种活动。情感是影响电影观众偏好的最至关重要的因素。因此，电影的情感方面需要进行确定和分析，为进一步建议提供依据。选择一部能引起观众情感共鸣的电影可能会具有挑战性。由于各种流派和选择，达成 group 一致意见可能很难。本文提出了一种通过研究电影描述（文本）、音乐（音频）和海报（图像）中的情感来提出新的群体电影建议的方法。我们使用 Jaccard 相似性指数将每个参与者的情感偏好与潜在电影选择匹配，然后使用模糊推理技术确定群体共识。我们使用加权集成过程对不同数据类型的情感分数进行融合。然后，群体电影推荐是基于当前情感和观众最喜欢的电影。在确定推荐后，使用模糊推理系统计算群体共识水平，以输入参与者的反馈。调查中的参与者（n=130）被提供了不同的情感类别，并被要求选择最适合特定电影的情感（n=12）。预测和实际得分的比较结果证明了使用情感检测解决这个问题（Jaccard 相似性指数 = 0.76）的有效性。我们还研究了诱导情感与电影流行程度之间的关系，作为另一个实验，分析了来自 TMDB 数据库中100部热门电影的情感分布。这样的系统可以有潜力提高电影推荐系统的准确性，并在具有不同偏好的参与者之间实现高水平的共识。

URL

https://arxiv.org/abs/2404.13778

PDF

https://arxiv.org/pdf/2404.13778.pdf
Read All
Using Adaptive Empathetic Responses for Teaching English

2024-04-21 20:21:24

Li Siyan, Teresa Shao, Zhou Yu, Julia Hirschberg

arXiv_CL

arXiv_CL Detection QA Optimization Transformer Pose Emotion Chat
Abstract

Existing English-teaching chatbots rarely incorporate empathy explicitly in their feedback, but empathetic feedback could help keep students engaged and reduce learner anxiety. Toward this end, we propose the task of negative emotion detection via audio, for recognizing empathetic feedback opportunities in language learning. We then build the first spoken English-teaching chatbot with adaptive, empathetic feedback. This feedback is synthesized through automatic prompt optimization of ChatGPT and is evaluated with English learners. We demonstrate the effectiveness of our system through a preliminary user study.

Abstract (translated)

目前，英语教学聊天机器人很少明确地在其反馈中包含情感，但是情感反馈可以帮助保持学生们的参与度，减轻学习焦虑。为此，我们提出了一个通过音频检测消极情绪的任务，以识别语言学习中情感反馈的机会。然后，我们构建了第一个带有自适应和情感反馈的英语教学聊天机器人。通过ChatGPT的自动提示优化来合成这种反馈，并使用英语学习者进行评估。我们通过初步用户研究来展示我们系统的有效性。

URL

https://arxiv.org/abs/2404.13764

PDF

https://arxiv.org/pdf/2404.13764.pdf
Read All
Robust EEG-based Emotion Recognition Using an Inception and Two-sided Perturbation Model

2024-04-21 07:54:43

Shadi Sartipi, Mujdat Cetin

arXiv_AI

arXiv_AI Recognition Deep_Learning Adversarial Face Attention Pose Emotion
Abstract

Automated emotion recognition using electroencephalogram (EEG) signals has gained substantial attention. Although deep learning approaches exhibit strong performance, they often suffer from vulnerabilities to various perturbations, like environmental noise and adversarial attacks. In this paper, we propose an Inception feature generator and two-sided perturbation (INC-TSP) approach to enhance emotion recognition in brain-computer interfaces. INC-TSP integrates the Inception module for EEG data analysis and employs two-sided perturbation (TSP) as a defensive mechanism against input perturbations. TSP introduces worst-case perturbations to the model's weights and inputs, reinforcing the model's elasticity against adversarial attacks. The proposed approach addresses the challenge of maintaining accurate emotion recognition in the presence of input uncertainties. We validate INC-TSP in a subject-independent three-class emotion recognition scenario, demonstrating robust performance.

Abstract (translated)

利用脑电图（EEG）信号进行自动情感识别已经引起了大量关注。虽然深度学习方法表现出强大的性能，但它们通常容易受到各种扰动的影响，比如环境噪声和攻击性扰动。在本文中，我们提出了一种Inception特征生成器和双侧扰动（INC-TSP）方法，以增强脑机接口中的情感识别。INC-TSP将Inception模块与EEG数据分析相结合，并使用双侧扰动（TSP）作为防御措施来对抗输入扰动。TSP将模型的权重和输入引入最坏情况扰动，增强了模型对攻击性扰动的弹性。所提出的方法解决了在输入不确定性的存在下保持准确情感识别的挑战。我们在一个独立于受试者的三分类情感识别场景中验证了INC-TSP，证明了其稳健性能。

URL

https://arxiv.org/abs/2404.15373

PDF

https://arxiv.org/pdf/2404.15373.pdf
Read All
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

2024-04-21 02:44:17

Xinxin Jiao, Liejun Wang, Yinfeng Yu

arXiv_AI

arXiv_AI Recognition Attention Bert Pose Action Emotion Speech
Abstract

Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spectrogram regions and integrate Hubert features for higher-level acoustic information. Our approach also includes a Hierarchical Cooperative Attention module (HCA) to merge features from various auditory levels. We evaluate our method on the IEMOCAP dataset and achieve 2.6\% and 1.87\% improvements on the weighted accuracy and unweighted accuracy, respectively. Extensive experiments demonstrate the effectiveness of the proposed method.

Abstract (translated)

语音情感识别在人与计算机交互中至关重要，但提取和利用音频中的情感线索仍然具有挑战性。本文介绍了一种名为MFHCA的新方法，用于基于多空间融合和层次合作注意的语音情感识别。我们采用多空间融合模块（MF）来有效地识别与情感相关的频谱图区域，并利用Hubert特征获取更高层次的音频信息。我们的方法还包括一个层次合作注意模块（HCA），以合并来自不同音频层次的特征。我们在IEMOCAP数据集上评估我们的方法，分别实现了2.6%和1.87%的加权准确性和无加权准确性的提高。大量实验证明所提出的方法的有效性。

URL

https://arxiv.org/abs/2404.13509

PDF

https://arxiv.org/pdf/2404.13509.pdf
Read All
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News

2024-04-21 00:14:03

Qixuan Zhang, Zhifeng Wang, Yang Liu, Zhenyue Qin, Kaihao Zhang, Sabrina Caldwell, Tom Gedeon

arXiv_CV

arXiv_CV Recognition Deep_Learning Relation Transformer Emotion Facial_Landmark
Abstract

In this paper, we present a novel benchmark for Emotion Recognition using facial landmarks extracted from realistic news videos. Traditional methods relying on RGB images are resource-intensive, whereas our approach with Facial Landmark Emotion Recognition (FLER) offers a simplified yet effective alternative. By leveraging Graph Neural Networks (GNNs) to analyze the geometric and spatial relationships of facial landmarks, our method enhances the understanding and accuracy of emotion recognition. We discuss the advancements and challenges in deep learning techniques for emotion recognition, particularly focusing on Graph Neural Networks (GNNs) and Transformers. Our experimental results demonstrate the viability and potential of our dataset as a benchmark, setting a new direction for future research in emotion recognition technologies. The codes and models are at: this https URL

Abstract (translated)

在本文中，我们提出了一个用于情感识别的基准，该基准是基于从现实新闻视频中提取的人脸特征点。传统依赖RGB图像的方法 resource-intensive，而我们的方法 Facial Landmark Emotion Recognition (FLER) 提供了一种简化的且有效的替代方案。通过利用图神经网络（GNNs）分析人脸特征点的几何和空间关系，我们的方法提高了情感识别的理解和准确性。我们讨论了用于情感识别的深度学习技术的进步和挑战，特别是关注图神经网络（GNNs）和Transformer。我们的实验结果表明，我们的数据集作为基准是可行的，为未来情感识别技术的研究奠定了新的方向。代码和模型在此处：<https:// this URL>

URL

https://arxiv.org/abs/2404.13493

PDF

https://arxiv.org/pdf/2404.13493.pdf
Read All
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

2024-04-19 16:09:17

Chengxin Chen, Pengyuan Zhang

arXiv_SD

arXiv_SD Recognition Pose Emotion Enhancement Speech
Abstract

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in diminished SER performance in practical use. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially increases the system's robustness in both matched and unmatched noisy environments, without compromising its performance in clean environments.

Abstract (translated)

在《Speech Emotion Recognition (SER)》，一个普遍的挑战是普遍存在的环境噪声，这经常导致在实际应用中SER性能减弱。在本文中，我们提出了一种名为TRNet的双层细化网络，用于解决这一挑战。具体来说，在前端噪声减少和噪声水平估计中使用预训练的语音增强模块。在模型训练期间，我们利用干净的语音时域和它们的相应深度表示作为参考信号来修整增强语音的时域变形和表示变化。实验结果证实，与未配对和配对噪声环境相比，所提出的TRNet显著提高了系统的鲁棒性，而没有牺牲其在干净环境中的性能。

URL

https://arxiv.org/abs/2404.12979

PDF

https://arxiv.org/pdf/2404.12979.pdf
Read All
Multi Class Depression Detection Through Tweets using Artificial Intelligence

2024-04-19 12:47:56

Muhammad Osama Nusrat, Waseem Shahzad, Saad Ahmed Jamal

arXiv_AI

arXiv_AI GAN Detection Deep_Learning Face Bert Transformer Pose Action Emotion
Abstract

Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.

Abstract (translated)

抑郁症是一个重要的问题。根据世界卫生组织（WHO），到2023年，有280 million个人正在经历抑郁症。这是一个巨大的数字，如果得不到重视，这些数字会迅速增加。大约4.89亿人使用社交媒体。人们会在像Twitter、Facebook、Reddit、Instagram等平台上表达他们的情感和情绪。这些平台包含可以用于研究目的的有价值的信息。已经在各个社交媒体平台上进行了一定的研究，但是这些努力存在一些局限性。特别是，以前的研究仅关注在推特中检测抑郁症及其严重程度。此外，数据集标签存在不准确的问题。在这项研究中，通过基于推特数据库的推文进行预测，预测了五种抑郁症类型（双相、主要、精神分裂、非典型和产后抑郁症）。利用可解释AI来提供推理，通过突出推特中代表抑郁症类型的部分来解释。双向编码器表示法（BERT）用于特征提取和训练。机器学习和深度学习方法被用于训练模型。BERT模型取得了最积极的结果，达到0.96的总体准确性。

URL

https://arxiv.org/abs/2404.13104

PDF

https://arxiv.org/pdf/2404.13104.pdf
Read All
Food Development through Co-creation with AI: bread with a 'taste of love'

2024-04-19 10:03:59

Takuya Sera, Izumi Kuwata, Yuki Taya, Noritaka Shimura, Yosuke Motohashi

arXiv_AI

arXiv_AI Recommendation Relation Emotion
Abstract

This study explores a new method in food development by utilizing AI including generative AI, aiming to craft products that delight the senses and resonate with consumers' emotions. The food ingredient recommendation approach used in this study can be considered as a form of multimodal generation in a broad sense, as it takes text as input and outputs food ingredient candidates. This Study focused on producing "Romance Bread," a collection of breads infused with flavors that reflect the nuances of a romantic Japanese television program. We analyzed conversations from TV programs and lyrics from songs featuring fruits and sweets to recommend ingredients that express romantic feelings. Based on these recommendations, the bread developers then considered the flavoring of the bread and developed new bread varieties. The research included a tasting evaluation involving 31 participants and interviews with the product developers. Findings indicate a notable correlation between tastes generated by AI and human preferences. This study validates the concept of using AI in food innovation and highlights the broad potential for developing unique consumer experiences that focus on emotional engagement through AI and human collaboration.

Abstract (translated)

本研究探索了一种通过利用AI来发展食品的新方法，包括生成式AI，旨在打造令感官愉悦并与其产生共鸣的产品。在研究中使用的食品配料推荐方法可以被认为是一种多模态生成形式，因为它以文本为输入并输出食品配料候选项。本研究专注于制作“浪漫面包”，这是一系列带有反映浪漫日本电视节目细微差别的面包。我们分析了电视节目中的对话和歌曲中的歌词，以推荐表达浪漫情感的配料。根据这些建议，面包开发者 then 考虑了面包的调味并开发了新品种的面包。研究包括31名参与者的品尝评估和与产品开发者的访谈。研究结果表明，AI生成的味道与人类偏好之间存在显著的相关性。本研究证实了利用AI进行食品创新的含义，并强调了通过AI和人类合作开发独特消费者体验的广泛潜力。

URL

https://arxiv.org/abs/2404.12760

PDF

https://arxiv.org/pdf/2404.12760.pdf
Read All
SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

2024-04-19 06:58:51

Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

arXiv_CL

arXiv_CL Detection Deep_Learning Classification Language_Model Bert Emotion
Abstract

In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.

Abstract (translated)

在社交媒体上，用户经常表达个人情感，其中可能包括潜在的自杀倾向的一部分。互联网语言中隐含和多样形式的表达使准确和快速识别社交媒体上的自杀意图具有挑战性，从而为及时干预努力创造了困难。为识别自杀风险的发展深度学习模型是一个有前景的解决方案，但在中文背景下，相关数据明显不足。为了填补这一空白，本研究针对精细自杀风险分类的中国社交媒体数据集进行了研究，重点关注自杀意图的表现、自杀方法和时间的紧迫性等指标。在两个任务中评估了7个预训练模型：高自杀风险和低自杀风险，以及精细自杀风险分类级别为0到10。在我们的实验中，深度学习模型在区分高和低自杀风险方面表现良好，最佳模型达到88.39%的F1得分。然而，精细自杀风险分类的结果仍然不令人满意，权重在F1得分上的F1分数为50.89%。为了解决数据不平衡和数据集有限的问题，我们研究了传统和先进的大型语言模型数据增强技术，证明数据增强可以通过提高F1得分最多4.65个百分点来增强模型的性能。值得注意的是，在心理领域数据预训练的中文MentalBERT模型在两个任务中都表现出色。这项研究为自动识别自杀个体提供了宝贵的见解，促进了社交媒体平台上的及时心理干预。源代码和数据公开可用。

URL

https://arxiv.org/abs/2404.12659

PDF

https://arxiv.org/pdf/2404.12659.pdf
Read All
Cooperative Sentiment Agents for Multimodal Sentiment Analysis

2024-04-19 05:48:09

Shanmin Wang, Hui Shuai, Qingshan Liu, Fei Wang

arXiv_CV

arXiv_CV Recognition Represenation_Learning Sentiment Pose Action Emotion Reconstruction Agent
Abstract

In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA. Co-SA comprises two critical components: the Sentiment Agents Establishment (SAE) phase and the Sentiment Agents Cooperation (SAC) phase. During the SAE phase, each sentiment agent deals with an unimodal signal and highlights explicit dynamic sentiment variations within the modality via the Modality-Sentiment Disentanglement (MSD) and Deep Phase Space Reconstruction (DPSR) modules. Subsequently, in the SAC phase, Co-SA meticulously designs task-specific interaction mechanisms for sentiment agents so that coordinating multimodal signals to learn the joint representation. Specifically, Co-SA equips an independent policy model for each sentiment agent that captures significant properties within the modality. These policies are optimized mutually through the unified reward adaptive to downstream tasks. Benefitting from the rewarding mechanism, Co-SA transcends the limitation of pre-defined fusion modes and adaptively captures unimodal properties for MRL in the multimodal interaction setting. To demonstrate the effectiveness of Co-SA, we apply it to address Multimodal Sentiment Analysis (MSA) and Multimodal Emotion Recognition (MER) tasks. Our comprehensive experimental results demonstrate that Co-SA excels at discovering diverse cross-modal features, encompassing both common and complementary aspects. The code can be available at this https URL.

Abstract (translated)

在本文中，我们提出了一个新的多模态表示学习（MRL）方法，名为合作情感代理（Co-SA），用于多模态情感分析（MSA），并通过合作情感代理促进模态之间的自适应交互。Co-SA包括两个关键组件：情感代理建立（SAE）阶段和情感代理合作（SAC）阶段。在SAE阶段，每个情感代理处理一个单模态信号，并通过模态情感解离（MSD）和深度时域重构（DPSR）模块在模态内突出显示动态情感变化。然后，在SAC阶段，Co-SA精心设计了一系列任务特定的情感代理交互机制，以协调多模态信号以学习联合表示。具体来说，Co-SA为每个情感代理配备了一个独立的政策模型，该模型捕捉模态内的显著属性。这些策略通过统一奖励适应下游任务进行优化。得益于奖励机制，Co-SA超越了预定义的融合模式，并适应了多模态交互设置中的情感代理学习（MRL）。为了证明Co-SA的有效性，我们将它应用于情感多模态分析和情感识别任务。我们全面的实验结果表明，Co-SA在发现跨模态特征方面表现出色，涵盖模态共性和互补性的各个方面。代码可以从该链接获取。

URL

https://arxiv.org/abs/2404.12642

PDF

https://arxiv.org/pdf/2404.12642.pdf
Read All
Augmenting emotion features in irony detection with Large language modeling

2024-04-18 16:11:17

Yucheng Lin, Yuhan Xia, Yunfei Long

arXiv_AI

arXiv_AI Detection Knowledge Language_Model Bert Transformer Emotion Enhancement Chat
Abstract

This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, our methodology augments the detection process by integrating subtle emotional cues, augmented through LLMs, into three benchmark pre-trained NLP models - BERT, T5, and GPT-2 - which are widely recognized as foundational in irony detection. We assessed our method using the SemEval-2018 Task 3 dataset and observed substantial enhancements in irony detection capabilities.

Abstract (translated)

本研究介绍了一种新颖的 Irony 检测方法，该方法采用基于提示的学习方法（LLMs）来促进情感中心化文本增强。传统的 Irony 检测技术通常因为其依赖静态语言特征和预定义知识库而不足，往往忽视了 Irony 中至关重要的细微情感维度。相比之下，我们的方法通过将微妙的情感线索通过 LLMs 增强，将三种广泛认为是 Irony 检测基础的预训练 NLP 模型 - BERT、T5 和 GPT-2 - 集成到检测过程中，从而增强了 Irony 检测能力。我们对该方法使用 SemEval-2018 任务 3 数据集进行了评估，并观察到 Irony 检测能力得到了显著提升。

URL

https://arxiv.org/abs/2404.12291

PDF

https://arxiv.org/pdf/2404.12291.pdf
Read All
Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models

2024-04-18 15:28:34

Israel A. Laurensi, Alceu de Souza Britto Jr., Jean Paul Barddal, Alessandro Lameiras Koerich

arXiv_CV

arXiv_CV GAN CNN Recognition Adversarial Knowledge Pose Emotion
Abstract

Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. This dual approach enables CNNs to retain past knowledge while learning new tasks, enhancing their performance in emotion recognition. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset while making the CNN retain previously learned knowledge.

Abstract (translated)

面部表情识别是机器学习的一个重要组成部分，促进了各种应用的发展。然而，卷积神经网络（CNNs）经常受到灾难性遗忘的困扰，这会阻碍其适应性。所提出的方法，情感为中心的生成性重放（ECgr），通过将生成对抗网络（GAN）生成的合成图像相结合来解决这一挑战。此外，ECgr 还包含一个质量保证算法，以确保生成图像的准确性。这种双方法使 CNN 能够保留过去的知识，同时学习新的任务，从而提高其在情感识别方面的性能。在四个多样的人脸表情数据集的实验结果中，采用我们伪重放方法生成的图像增强了目标数据集和源数据集的训练，同时使 CNN 保留之前学习的知识。

URL

https://arxiv.org/abs/2404.12260

PDF

https://arxiv.org/pdf/2404.12260.pdf
Read All
Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities

2024-04-18 15:18:14

Luciana Trinkaus Menon, Luiz Carlos Ribeiro Neduziak, Jean Paul Barddal, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr

arXiv_CV

arXiv_CV Recognition Attention Prediction Emotion Speech
Abstract

The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial expressions (image), are crucial in understanding human emotions. However, AI's journey in multimodal emotion recognition (MER) is marked by substantial technical challenges. One significant hurdle is how AI models manage the absence of a particular modality - a frequent occurrence in real-world situations. This study's central focus is assessing the performance and resilience of two strategies when confronted with the lack of one modality: a novel multimodal dynamic modality and view selection and a cross-attention mechanism. Results on the RECOLA dataset show that dynamic selection-based methods are a promising approach for MER. In the missing modalities scenarios, all dynamic selection-based methods outperformed the baseline. The study concludes by emphasizing the intricate interplay between audio and video modalities in emotion prediction, showcasing the adaptability of dynamic selection methods in handling missing modalities.

Abstract (translated)

人类情感的研究,传统上是一个心理学和神经科学领域的基石,受到了人工智能(AI)的深刻影响。多种渠道,如语音(声音)和面部表情(图像),对于理解人类情感至关重要。然而,AI在多模态情感识别(MER)方面的旅程充满了技术挑战。一个重要的挑战是AI模型如何处理特定模态的缺失 - 在现实情况中这是一种常见的情况。本研究的核心是对两种策略在遇到一种缺失模态时的表现和恢复力的评估:一种新颖的多模态动态模态和视图选择,以及跨注意机制。RECOLA数据集上的结果表明,基于动态选择的策略对于MER来说是一个有前景的方法。在缺失模态场景中,所有基于动态选择的策略都超过了基线。本研究结论强调了音频和视频模态在情感预测中的复杂相互作用,展示了动态选择方法在处理缺失模态的适应性。

URL

https://arxiv.org/abs/2404.12251

PDF

https://arxiv.org/pdf/2404.12251.pdf
Read All
Meta-Auxiliary Learning for Micro-Expression Recognition

2024-04-18 09:21:16

Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

arXiv_CV

arXiv_CV Recognition Detection Knowledge Optimization Pose Emotion
Abstract

Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.

Abstract (translated)

微表情（MEs）是指不经意的运动，揭示了人们隐藏的感受，其对于情感检测的客观性吸引了众多关注。然而，尽管它在各种场景中具有广泛的应用，但在现实生活中，微表情识别（MER）仍然是一个具有挑战性的问题，由于以下三个原因： 1. 数据层面：数据不足和数据不平衡； 2. 特征层面：微表情的微妙、快速变化和复杂特征； 3. 决策层面：个体差异的影响。为了应对这些问题，我们提出了一个双分支元辅助学习方法，称为LightmanNet，用于快速且稳健的微表情识别。具体来说，LightmanNet通过双分支生物级优化过程从有限的数据中学习通用MER知识：（i）在第一层，它通过两个分支获得任务特定的MER知识，第一个分支通过学习主要MER任务中的MER特征来获得，而另一个分支则通过引导模型通过辅助任务获得具有区分性的特征，即通过它们在空间和时间行为模式中的相似性来获得。两个分支的学习共同约束了学习有意义的任务特定MER知识的同时，避免了学习噪声或浅层连接可能会损害其泛化能力的可能性。（ii）在第二层，LightmanNet进一步优化了已学习的任务特定知识，提高了模型的泛化能力和效率。在各种基准数据集上的广泛实验证明，LightmanNet具有卓越的稳健性和效率。

URL

https://arxiv.org/abs/2404.12024

PDF

https://arxiv.org/pdf/2404.12024.pdf
Read All

Content

Emotion (20)

Emotion

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL