Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we identify test content that is focused on particular domains and experiences that only reflect a certain demographic or that are potentially emotionally upsetting; both of which could inadvertently impact a test-taker's score. This kind of content doesn't reflect typical biases out of context, making it challenging even for modern models that contain safeguards. We build a dataset of 621 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of .791 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.
自然语言生成工具对于生成内容非常强大和有效。然而,语言模型已经被证明存在偏见和不公平问题,这使得它们在许多用例上部署不实用。在这里,我们关注公平性问题如何影响自动生成的测试内容,这些内容可能对测试者得分产生严格的要求,以确保测试只衡量了它本应测量的内容。具体来说,我们识别出关注特定领域和经验的测试内容,这可能只反映了某些人口统计学或可能引起情感不安的内容;这两者都可能无意中影响测试者的得分。这类内容不反映上下文的典型偏见,这使得现代模型(包含安全措施)更难以处理。我们建立了一个为公平性 annotated的621个生成的文本的数据集,并探讨了分类的方法:微调、基于主题的分类和提示,包括少样本和自纠正提示。我们发现,结合自纠正提示和少样本学习效果最好,在 hold-out 测试集上的 F1 分数为.791,而BERT 和基于主题的模型在离域数据上的竞争性能较小。
https://arxiv.org/abs/2404.15104
Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: this https URL.
理解情感和表达是一个跨越多个学科的任务,尤其是在提高用户体验方面。与普遍认识相反,已经证明情感并不是离散的实体,而是存在于一个连续的过程中。由于各种因素(包括文化背景、个人经历和认知偏见)的不同,人们对离散情感的理解存在差异。因此,大多数表达理解方法,尤其是那些依赖离散类别的,在本质上存在偏见。在本文中,我们对两个常见的数据集(AffectNet和EMOTIC)进行了比较深入的分析和评估,这些数据集配备了共轭模型的组件。此外,我们提出了一个专为轻量级应用设计的面部表情预测模型。通过基于小规模的MaxViT模型架构,我们在训练过程中使用连续的紧张和兴奋标签对离散表达类别标签的影响进行了评估。我们发现,在考虑紧张和兴奋标签的同时,使用离散类别标签可以显著提高表情推断。所提出的模型在AffectNet上优于现有状态,将其确立为推断紧张和兴奋的最佳模型,具有7%的较低MSE。训练脚本和训练权重以复制我们的结果可以从这里找到:https://www. this URL。
https://arxiv.org/abs/2404.14975
Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison between their attitudes and support for each goal and those of humans. We examine the potential disparities, primarily focusing on aspects such as understanding and emotions, cultural and regional differences, task objective variations, and factors considered in the decision-making process. These disparities arise from the underrepresentation and imbalance in LLM training data, historical biases, quality issues, lack of contextual understanding, and skewed ethical values reflected. The study also investigates the risks and harms that may arise from neglecting the attitudes of LLMs towards the SDGs, including the exacerbation of social inequalities, racial discrimination, environmental destruction, and resource wastage. To address these challenges, we propose strategies and recommendations to guide and regulate the application of LLMs, ensuring their alignment with the principles and goals of the SDGs, and therefore creating a more just, inclusive, and sustainable future.
大语言模型(LLMs)已成为促进联合国可持续发展目标(SDGs)的有力工具。然而,LLMs和人类之间针对这些目标的態度差异可能会带来重大挑战。这项研究对现有文献进行了全面回顾和分析,重点关注LLMs对17个SDGs的態度,强调它们的态度和支持与人类的相比。我们检查了可能存在的差异,主要关注理解与情感、文化地区差异、任务目标变化和决策过程因素等方面。这些差异源于LLM训练数据的不足和失衡,历史偏见,质量问题,缺乏语境理解,以及反映伦理价值观的失衡。研究还探讨了忽视LLMs对SDGs的態度可能产生的风险和危害,包括加剧社会不平等、种族歧视、环境破坏和资源浪费。为了应对这些挑战,我们提出了指导和管理LLM使用的策略和建议,确保其与SDGs的原则和目标保持一致,从而为创造一个更加公正、包容和可持续的未来做出贡献。
https://arxiv.org/abs/2404.13885
Watching movies is one of the social activities typically done in groups. Emotion is the most vital factor that affects movie viewers' preferences. So, the emotional aspect of the movie needs to be determined and analyzed for further recommendations. It can be challenging to choose a movie that appeals to the emotions of a diverse group. Reaching an agreement for a group can be difficult due to the various genres and choices. This paper proposes a novel approach to group movie suggestions by examining emotions from three different channels: movie descriptions (text), soundtracks (audio), and posters (image). We employ the Jaccard similarity index to match each participant's emotional preferences to prospective movie choices, followed by a fuzzy inference technique to determine group consensus. We use a weighted integration process for the fusion of emotion scores from diverse data types. Then, group movie recommendation is based on prevailing emotions and viewers' best-loved movies. After determining the recommendations, the group's consensus level is calculated using a fuzzy inference system, taking participants' feedback as input. Participants (n=130) in the survey were provided with different emotion categories and asked to select the emotions best suited for particular movies (n=12). Comparison results between predicted and actual scores demonstrate the efficiency of using emotion detection for this problem (Jaccard similarity index = 0.76). We explored the relationship between induced emotions and movie popularity as an additional experiment, analyzing emotion distribution in 100 popular movies from the TMDB database. Such systems can potentially improve the accuracy of movie recommendation systems and achieve a high level of consensus among participants with diverse preferences.
观看电影是人们在集体活动中通常会进行的一种活动。情感是影响电影观众偏好的最至关重要的因素。因此,电影的情感方面需要进行确定和分析,为进一步建议提供依据。选择一部能引起观众情感共鸣的电影可能会具有挑战性。由于各种流派和选择,达成 group 一致意见可能很难。本文提出了一种通过研究电影描述(文本)、音乐(音频)和海报(图像)中的情感来提出新的群体电影建议的方法。我们使用 Jaccard 相似性指数将每个参与者的情感偏好与潜在电影选择匹配,然后使用模糊推理技术确定群体共识。我们使用加权集成过程对不同数据类型的情感分数进行融合。然后,群体电影推荐是基于当前情感和观众最喜欢的电影。在确定推荐后,使用模糊推理系统计算群体共识水平,以输入参与者的反馈。调查中的参与者(n=130)被提供了不同的情感类别,并被要求选择最适合特定电影的情感(n=12)。预测和实际得分的比较结果证明了使用情感检测解决这个问题(Jaccard 相似性指数 = 0.76)的有效性。我们还研究了诱导情感与电影流行程度之间的关系,作为另一个实验,分析了来自 TMDB 数据库中100部热门电影的情感分布。这样的系统可以有潜力提高电影推荐系统的准确性,并在具有不同偏好的参与者之间实现高水平的共识。
https://arxiv.org/abs/2404.13778
Existing English-teaching chatbots rarely incorporate empathy explicitly in their feedback, but empathetic feedback could help keep students engaged and reduce learner anxiety. Toward this end, we propose the task of negative emotion detection via audio, for recognizing empathetic feedback opportunities in language learning. We then build the first spoken English-teaching chatbot with adaptive, empathetic feedback. This feedback is synthesized through automatic prompt optimization of ChatGPT and is evaluated with English learners. We demonstrate the effectiveness of our system through a preliminary user study.
目前,英语教学聊天机器人很少明确地在其反馈中包含情感,但是情感反馈可以帮助保持学生们的参与度,减轻学习焦虑。为此,我们提出了一个通过音频检测消极情绪的任务,以识别语言学习中情感反馈的机会。然后,我们构建了第一个带有自适应和情感反馈的英语教学聊天机器人。通过ChatGPT的自动提示优化来合成这种反馈,并使用英语学习者进行评估。我们通过初步用户研究来展示我们系统的有效性。
https://arxiv.org/abs/2404.13764
Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spectrogram regions and integrate Hubert features for higher-level acoustic information. Our approach also includes a Hierarchical Cooperative Attention module (HCA) to merge features from various auditory levels. We evaluate our method on the IEMOCAP dataset and achieve 2.6\% and 1.87\% improvements on the weighted accuracy and unweighted accuracy, respectively. Extensive experiments demonstrate the effectiveness of the proposed method.
语音情感识别在人与计算机交互中至关重要,但提取和利用音频中的情感线索仍然具有挑战性。本文介绍了一种名为MFHCA的新方法,用于基于多空间融合和层次合作注意的语音情感识别。我们采用多空间融合模块(MF)来有效地识别与情感相关的频谱图区域,并利用Hubert特征获取更高层次的音频信息。我们的方法还包括一个层次合作注意模块(HCA),以合并来自不同音频层次的特征。我们在IEMOCAP数据集上评估我们的方法,分别实现了2.6%和1.87%的加权准确性和无加权准确性的提高。大量实验证明所提出的方法的有效性。
https://arxiv.org/abs/2404.13509
In this paper, we present a novel benchmark for Emotion Recognition using facial landmarks extracted from realistic news videos. Traditional methods relying on RGB images are resource-intensive, whereas our approach with Facial Landmark Emotion Recognition (FLER) offers a simplified yet effective alternative. By leveraging Graph Neural Networks (GNNs) to analyze the geometric and spatial relationships of facial landmarks, our method enhances the understanding and accuracy of emotion recognition. We discuss the advancements and challenges in deep learning techniques for emotion recognition, particularly focusing on Graph Neural Networks (GNNs) and Transformers. Our experimental results demonstrate the viability and potential of our dataset as a benchmark, setting a new direction for future research in emotion recognition technologies. The codes and models are at: this https URL
在本文中,我们提出了一个用于情感识别的基准,该基准是基于从现实新闻视频中提取的人脸特征点。传统依赖RGB图像的方法 resource-intensive,而我们的方法 Facial Landmark Emotion Recognition (FLER) 提供了一种简化的且有效的替代方案。通过利用图神经网络(GNNs)分析人脸特征点的几何和空间关系,我们的方法提高了情感识别的理解和准确性。我们讨论了用于情感识别的深度学习技术的进步和挑战,特别是关注图神经网络(GNNs)和Transformer。我们的实验结果表明,我们的数据集作为基准是可行的,为未来情感识别技术的研究奠定了新的方向。代码和模型在此处:<https:// this URL>
https://arxiv.org/abs/2404.13493
One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in diminished SER performance in practical use. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially increases the system's robustness in both matched and unmatched noisy environments, without compromising its performance in clean environments.
在《Speech Emotion Recognition (SER)》,一个普遍的挑战是普遍存在的环境噪声,这经常导致在实际应用中SER性能减弱。在本文中,我们提出了一种名为TRNet的双层细化网络,用于解决这一挑战。具体来说,在前端噪声减少和噪声水平估计中使用预训练的语音增强模块。在模型训练期间,我们利用干净的语音时域和它们的相应深度表示作为参考信号来修整增强语音的时域变形和表示变化。实验结果证实,与未配对和配对噪声环境相比,所提出的TRNet显著提高了系统的鲁棒性,而没有牺牲其在干净环境中的性能。
https://arxiv.org/abs/2404.12979
Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.
抑郁症是一个重要的问题。根据世界卫生组织(WHO),到2023年,有280 million个人正在经历抑郁症。这是一个巨大的数字,如果得不到重视,这些数字会迅速增加。大约4.89亿人使用社交媒体。人们会在像Twitter、Facebook、Reddit、Instagram等平台上表达他们的情感和情绪。这些平台包含可以用于研究目的的有价值的信息。已经在各个社交媒体平台上进行了一定的研究,但是这些努力存在一些局限性。特别是,以前的研究仅关注在推特中检测抑郁症及其严重程度。此外,数据集标签存在不准确的问题。在这项研究中,通过基于推特数据库的推文进行预测,预测了五种抑郁症类型(双相、主要、精神分裂、非典型和产后抑郁症)。利用可解释AI来提供推理,通过突出推特中代表抑郁症类型的部分来解释。双向编码器表示法(BERT)用于特征提取和训练。机器学习和深度学习方法被用于训练模型。BERT模型取得了最积极的结果,达到0.96的总体准确性。
https://arxiv.org/abs/2404.13104
This study explores a new method in food development by utilizing AI including generative AI, aiming to craft products that delight the senses and resonate with consumers' emotions. The food ingredient recommendation approach used in this study can be considered as a form of multimodal generation in a broad sense, as it takes text as input and outputs food ingredient candidates. This Study focused on producing "Romance Bread," a collection of breads infused with flavors that reflect the nuances of a romantic Japanese television program. We analyzed conversations from TV programs and lyrics from songs featuring fruits and sweets to recommend ingredients that express romantic feelings. Based on these recommendations, the bread developers then considered the flavoring of the bread and developed new bread varieties. The research included a tasting evaluation involving 31 participants and interviews with the product developers. Findings indicate a notable correlation between tastes generated by AI and human preferences. This study validates the concept of using AI in food innovation and highlights the broad potential for developing unique consumer experiences that focus on emotional engagement through AI and human collaboration.
本研究探索了一种通过利用AI来发展食品的新方法,包括生成式AI,旨在打造令感官愉悦并与其产生共鸣的产品。在研究中使用的食品配料推荐方法可以被认为是一种多模态生成形式,因为它以文本为输入并输出食品配料候选项。本研究专注于制作“浪漫面包”,这是一系列带有反映浪漫日本电视节目细微差别的面包。我们分析了电视节目中的对话和歌曲中的歌词,以推荐表达浪漫情感的配料。根据这些建议,面包开发者 then 考虑了面包的调味并开发了新品种的面包。研究包括31名参与者的品尝评估和与产品开发者的访谈。研究结果表明,AI生成的味道与人类偏好之间存在显著的相关性。本研究证实了利用AI进行食品创新的含义,并强调了通过AI和人类合作开发独特消费者体验的广泛潜力。
https://arxiv.org/abs/2404.12760
In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.
在社交媒体上,用户经常表达个人情感,其中可能包括潜在的自杀倾向的一部分。互联网语言中隐含和多样形式的表达使准确和快速识别社交媒体上的自杀意图具有挑战性,从而为及时干预努力创造了困难。为识别自杀风险的发展深度学习模型是一个有前景的解决方案,但在中文背景下,相关数据明显不足。为了填补这一空白,本研究针对精细自杀风险分类的中国社交媒体数据集进行了研究,重点关注自杀意图的表现、自杀方法和时间的紧迫性等指标。在两个任务中评估了7个预训练模型:高自杀风险和低自杀风险,以及精细自杀风险分类级别为0到10。在我们的实验中,深度学习模型在区分高和低自杀风险方面表现良好,最佳模型达到88.39%的F1得分。然而,精细自杀风险分类的结果仍然不令人满意,权重在F1得分上的F1分数为50.89%。为了解决数据不平衡和数据集有限的问题,我们研究了传统和先进的大型语言模型数据增强技术,证明数据增强可以通过提高F1得分最多4.65个百分点来增强模型的性能。值得注意的是,在心理领域数据预训练的中文MentalBERT模型在两个任务中都表现出色。这项研究为自动识别自杀个体提供了宝贵的见解,促进了社交媒体平台上的及时心理干预。源代码和数据公开可用。
https://arxiv.org/abs/2404.12659
In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA. Co-SA comprises two critical components: the Sentiment Agents Establishment (SAE) phase and the Sentiment Agents Cooperation (SAC) phase. During the SAE phase, each sentiment agent deals with an unimodal signal and highlights explicit dynamic sentiment variations within the modality via the Modality-Sentiment Disentanglement (MSD) and Deep Phase Space Reconstruction (DPSR) modules. Subsequently, in the SAC phase, Co-SA meticulously designs task-specific interaction mechanisms for sentiment agents so that coordinating multimodal signals to learn the joint representation. Specifically, Co-SA equips an independent policy model for each sentiment agent that captures significant properties within the modality. These policies are optimized mutually through the unified reward adaptive to downstream tasks. Benefitting from the rewarding mechanism, Co-SA transcends the limitation of pre-defined fusion modes and adaptively captures unimodal properties for MRL in the multimodal interaction setting. To demonstrate the effectiveness of Co-SA, we apply it to address Multimodal Sentiment Analysis (MSA) and Multimodal Emotion Recognition (MER) tasks. Our comprehensive experimental results demonstrate that Co-SA excels at discovering diverse cross-modal features, encompassing both common and complementary aspects. The code can be available at this https URL.
在本文中,我们提出了一个新的多模态表示学习(MRL)方法,名为合作情感代理(Co-SA),用于多模态情感分析(MSA),并通过合作情感代理促进模态之间的自适应交互。Co-SA包括两个关键组件:情感代理建立(SAE)阶段和情感代理合作(SAC)阶段。在SAE阶段,每个情感代理处理一个单模态信号,并通过模态情感解离(MSD)和深度时域重构(DPSR)模块在模态内突出显示动态情感变化。然后,在SAC阶段,Co-SA精心设计了一系列任务特定的情感代理交互机制,以协调多模态信号以学习联合表示。具体来说,Co-SA为每个情感代理配备了一个独立的政策模型,该模型捕捉模态内的显著属性。这些策略通过统一奖励适应下游任务进行优化。得益于奖励机制,Co-SA超越了预定义的融合模式,并适应了多模态交互设置中的情感代理学习(MRL)。为了证明Co-SA的有效性,我们将它应用于情感多模态分析和情感识别任务。我们全面的实验结果表明,Co-SA在发现跨模态特征方面表现出色,涵盖模态共性和互补性的各个方面。代码可以从该链接获取。
https://arxiv.org/abs/2404.12642
This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, our methodology augments the detection process by integrating subtle emotional cues, augmented through LLMs, into three benchmark pre-trained NLP models - BERT, T5, and GPT-2 - which are widely recognized as foundational in irony detection. We assessed our method using the SemEval-2018 Task 3 dataset and observed substantial enhancements in irony detection capabilities.
本研究介绍了一种新颖的 Irony 检测方法,该方法采用基于提示的学习方法(LLMs)来促进情感中心化文本增强。传统的 Irony 检测技术通常因为其依赖静态语言特征和预定义知识库而不足,往往忽视了 Irony 中至关重要的细微情感维度。相比之下,我们的方法通过将微妙的情感线索通过 LLMs 增强,将三种广泛认为是 Irony 检测基础的预训练 NLP 模型 - BERT、T5 和 GPT-2 - 集成到检测过程中,从而增强了 Irony 检测能力。我们对该方法使用 SemEval-2018 任务 3 数据集进行了评估,并观察到 Irony 检测能力得到了显著提升。
https://arxiv.org/abs/2404.12291
Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. This dual approach enables CNNs to retain past knowledge while learning new tasks, enhancing their performance in emotion recognition. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset while making the CNN retain previously learned knowledge.
面部表情识别是机器学习的一个重要组成部分,促进了各种应用的发展。然而,卷积神经网络(CNNs)经常受到灾难性遗忘的困扰,这会阻碍其适应性。所提出的方法,情感为中心的生成性重放(ECgr),通过将生成对抗网络(GAN)生成的合成图像相结合来解决这一挑战。此外,ECgr 还包含一个质量保证算法,以确保生成图像的准确性。这种双方法使 CNN 能够保留过去的知识,同时学习新的任务,从而提高其在情感识别方面的性能。在四个多样的人脸表情数据集的实验结果中,采用我们伪重放方法生成的图像增强了目标数据集和源数据集的训练,同时使 CNN 保留之前学习的知识。
https://arxiv.org/abs/2404.12260
The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial expressions (image), are crucial in understanding human emotions. However, AI's journey in multimodal emotion recognition (MER) is marked by substantial technical challenges. One significant hurdle is how AI models manage the absence of a particular modality - a frequent occurrence in real-world situations. This study's central focus is assessing the performance and resilience of two strategies when confronted with the lack of one modality: a novel multimodal dynamic modality and view selection and a cross-attention mechanism. Results on the RECOLA dataset show that dynamic selection-based methods are a promising approach for MER. In the missing modalities scenarios, all dynamic selection-based methods outperformed the baseline. The study concludes by emphasizing the intricate interplay between audio and video modalities in emotion prediction, showcasing the adaptability of dynamic selection methods in handling missing modalities.
人类情感的研究,传统上是一个心理学和神经科学领域的基石,受到了人工智能(AI)的深刻影响。多种渠道,如语音(声音)和面部表情(图像),对于理解人类情感至关重要。然而,AI在多模态情感识别(MER)方面的旅程充满了技术挑战。一个重要的挑战是AI模型如何处理特定模态的缺失 - 在现实情况中这是一种常见的情况。本研究的核心是对两种策略在遇到一种缺失模态时的表现和恢复力的评估:一种新颖的多模态动态模态和视图选择,以及跨注意机制。RECOLA数据集上的结果表明,基于动态选择的策略对于MER来说是一个有前景的方法。在缺失模态场景中,所有基于动态选择的策略都超过了基线。本研究结论强调了音频和视频模态在情感预测中的复杂相互作用,展示了动态选择方法在处理缺失模态的适应性。
https://arxiv.org/abs/2404.12251
Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.
微表情(MEs)是指不经意的运动,揭示了人们隐藏的感受,其对于情感检测的客观性吸引了众多关注。然而,尽管它在各种场景中具有广泛的应用,但在现实生活中,微表情识别(MER)仍然是一个具有挑战性的问题,由于以下三个原因: 1. 数据层面:数据不足和数据不平衡; 2. 特征层面:微表情的微妙、快速变化和复杂特征; 3. 决策层面:个体差异的影响。 为了应对这些问题,我们提出了一个双分支元辅助学习方法,称为LightmanNet,用于快速且稳健的微表情识别。具体来说,LightmanNet通过双分支生物级优化过程从有限的数据中学习通用MER知识:(i)在第一层,它通过两个分支获得任务特定的MER知识,第一个分支通过学习主要MER任务中的MER特征来获得,而另一个分支则通过引导模型通过辅助任务获得具有区分性的特征,即通过它们在空间和时间行为模式中的相似性来获得。两个分支的学习共同约束了学习有意义的任务特定MER知识的同时,避免了学习噪声或浅层连接可能会损害其泛化能力的可能性。(ii)在第二层,LightmanNet进一步优化了已学习的任务特定知识,提高了模型的泛化能力和效率。在各种基准数据集上的广泛实验证明,LightmanNet具有卓越的稳健性和效率。
https://arxiv.org/abs/2404.12024
Text animation serves as an expressive medium, transforming static communication into dynamic experiences by infusing words with motion to evoke emotions, emphasize meanings, and construct compelling narratives. Crafting animations that are semantically aware poses significant challenges, demanding expertise in graphic design and animation. We present an automated text animation scheme, termed "Dynamic Typography", which combines two challenging tasks. It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts. Our technique harnesses vector graphics representations and an end-to-end optimization-based framework. This framework employs neural displacement fields to convert letters into base shapes and applies per-frame motion, encouraging coherence with the intended textual concept. Shape preservation techniques and perceptual loss regularization are employed to maintain legibility and structural integrity throughout the animation process. We demonstrate the generalizability of our approach across various text-to-video models and highlight the superiority of our end-to-end methodology over baseline methods, which might comprise separate tasks. Through quantitative and qualitative evaluations, we demonstrate the effectiveness of our framework in generating coherent text animations that faithfully interpret user prompts while maintaining readability. Our code is available at: this https URL.
文本动画是一种表达性的媒介,通过将文字注入运动以唤起情感、强调意义并构建引人入胜的故事,将静态通信转化为动态体验。打造语义意识到的动画 poses 显著的挑战,需要掌握图形设计和动画的专业知识。我们提出了一个自动文本动画方案,称为“动态字体”,结合了两个具有挑战性的任务。它通过变形字母来传达语义意义,并基于用户提示充满活力地注入生动的运动。我们的技术利用了向量图形表示和基于端到端优化的框架。该框架采用神经微分场来将字母转换为基本形状,并应用每一帧的运动,鼓励与预期文本概念的连贯性。形状保留技术和感知损失 regularization 被采用来保持动画过程的清晰度和结构完整性。我们通过定量和定性评估证明了我们的方法在生成连贯的文本动画,同时保留可读性的效果。我们的代码可在此处访问:https://this URL。
https://arxiv.org/abs/2404.11614
Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.
认知行为疗法(CBT)是一种有效的治疗心理疾病的方法,针对源于心理疾病的非理性思维,但它需要精确识别认知通路才能在患者护理中成功实施。在当今社会,个人经常在社交媒体上表达针对特定主题的负面情绪,通常表现出扭曲的认知,包括极端情况下自杀行为。然而,目前尚无分析认知通路的方法,可以帮助心理治疗师在网上进行有效的干预。在这项研究中,我们收集了来自社交媒体的数据,并确立了提取认知通路的任务,基于认知理论框架进行数据注释。我们最初将提取认知通路的任务归类为分层文本分类,包括四个主要类别和19个子类别。接下来,我们设立了一个文本摘要任务,帮助心理治疗师快速掌握关键信息。我们的实验评估了深度学习和大型语言模型(LLMs)在这些任务上的表现。实验结果表明,我们的深度学习方法在分层文本分类任务上取得了62.34%的微F1得分。与此同时,在文本摘要任务上,GPT-4获得了Rouge-1得分54.92和Rouge-2得分30.86,超过了实验深度学习模型的性能。然而,它可能存在幻觉问题。我们已经将所有模型和代码公开发布,以支持在这个领域进行进一步的研究。
https://arxiv.org/abs/2404.11449
Automated dialogue systems are important applications of artificial intelligence, and traditional systems struggle to understand user emotions and provide empathetic feedback. This study integrates emotional intelligence technology into automated dialogue systems and creates a dialogue generation model with emotional intelligence through deep learning and natural language processing techniques. The model can detect and understand a wide range of emotions and specific pain signals in real time, enabling the system to provide empathetic interaction. By integrating the results of the study "Can artificial intelligence detect pain and express pain empathy?", the model's ability to understand the subtle elements of pain empathy has been enhanced, setting higher standards for emotional intelligence dialogue systems. The project aims to provide theoretical understanding and practical suggestions to integrate advanced emotional intelligence capabilities into dialogue systems, thereby improving user experience and interaction quality.
自动对话系统是人工智能的重要应用之一,但传统系统很难理解用户的情感并提供体贴的反馈。通过将情感智能技术集成到自动对话系统中,并通过深度学习和自然语言处理技术创建一个具有情感意识的对话生成模型。该模型可以实时检测和理解广泛的情感和特定的疼痛信号,使系统能够提供体贴的交互。通过将研究的“人工智能能否检测疼痛并表达疼痛同理?”的结果集成到模型中,模型对疼痛同理的理解能力得到了增强,为情感智能对话系统设定了更高的标准。该项目旨在提供理论理解和实际建议,以将先进的情感智能功能集成到对话系统中,从而提高用户体验和交互质量。
https://arxiv.org/abs/2404.11447
The advent of deep learning models has made a considerable contribution to the achievement of Emotion Recognition in Conversation (ERC). However, this task still remains an important challenge due to the plurality and subjectivity of human emotions. Previous work on ERC provides predictive models using mostly graph-based conversation representations. In this work, we propose a way to model the conversational context that we incorporate into a metric learning training strategy, with a two-step process. This allows us to perform ERC in a flexible classification scenario and to end up with a lightweight yet efficient model. Using metric learning through a Siamese Network architecture, we achieve 57.71 in macro F1 score for emotion classification in conversation on DailyDialog dataset, which outperforms the related work. This state-of-the-art result is promising regarding the use of metric learning for emotion recognition, yet perfectible compared to the microF1 score obtained.
深度学习模型的出现对实现对话中情感识别(ERC)做出了显著的贡献。然而,由于人类情感的多样性和主观性,这项任务仍然是一个重要的挑战。以前的工作主要使用基于图的对话表示来构建预测模型。在这项工作中,我们提出了一种将对话上下文建模为元学习训练策略的方法,包括两个步骤。这使我们能够在灵活的分类场景中执行ERC,并实现了一个轻量级但高效的模型。通过Siamese网络架构进行元学习,我们在DailyDialog数据集上取得了57.71的宏观F1分数的 emotion分类,超过了相关研究。这种最先进的结果在关于使用元学习进行情感识别方面具有前景,然而与微F1分数相比还有待提高。
https://arxiv.org/abs/2404.11141