In social media, neural network models have been applied to hate speech detection, sentiment analysis, etc., but neural network models are susceptible to adversarial attacks. For instance, in a text classification task, the attacker elaborately introduces perturbations to the original texts that hardly alter the original semantics in order to trick the model into making different predictions. By studying textual adversarial attack methods, the robustness of language models can be evaluated and then improved. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, there is little research targeting Chinese minority languages. With the rapid development of artificial intelligence technology and the emergence of Chinese minority language models, textual adversarial attacks become a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker. We utilize the masked language models to generate candidate substitution syllables or words, adopt the scoring mechanism to determine the substitution order, and then conduct the attack method on several fine-tuned victim models. The experimental results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples, which has an evidently higher attack effect than the baseline method.
https://arxiv.org/abs/2412.02343
Recently, Large Language Models (LLMs) have garnered increasing attention in the field of natural language processing, revolutionizing numerous downstream tasks with powerful reasoning and generation abilities. For example, In-Context Learning (ICL) introduces a fine-tuning-free paradigm, allowing out-of-the-box LLMs to execute downstream tasks by analogy learning without any fine-tuning. Besides, in a fine-tuning-dependent paradigm where substantial training data exists, Parameter-Efficient Fine-Tuning (PEFT), as the cost-effective methods, enable LLMs to achieve excellent performance comparable to full fine-tuning. However, these fascinating techniques employed by LLMs have not been fully exploited in the ABSA field. Previous works probe LLMs in ABSA by merely using randomly selected input-output pairs as demonstrations in ICL, resulting in an incomplete and superficial evaluation. In this paper, we shed light on a comprehensive evaluation of LLMs in the ABSA field, involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. Specifically, we design a unified task formulation to unify ``multiple LLMs for multiple ABSA subtasks in multiple paradigms.'' For the fine-tuning-dependent paradigm, we efficiently fine-tune LLMs using instruction-based multi-task learning. For the fine-tuning-free paradigm, we propose 3 demonstration selection strategies to stimulate the few-shot abilities of LLMs. Our extensive experiments demonstrate that LLMs achieve a new state-of-the-art performance compared to fine-tuned Small Language Models (SLMs) in the fine-tuning-dependent paradigm. More importantly, in the fine-tuning-free paradigm where SLMs are ineffective, LLMs with ICL still showcase impressive potential and even compete with fine-tuned SLMs on some ABSA subtasks.
https://arxiv.org/abs/2412.02279
Social media platforms, particularly Reddit's r/Epilepsy community, offer a unique perspective into the experiences of individuals with epilepsy (PWE) and their caregivers. This study analyzes 57k posts and 533k comments to explore key themes across demographics such as age, gender, and relationships. Our findings highlight significant discussions on epilepsy-related challenges, including depression (with 39.75\% of posts indicating severe symptoms), driving restrictions, workplace concerns, and pregnancy-related issues in women with epilepsy. We introduce a novel engagement metric, F(P), which incorporates post length, sentiment scores, and readability to quantify community interaction. This analysis underscores the importance of integrated care addressing both neurological and mental health challenges faced by PWE. The insights from this study inform strategies for targeted support and awareness interventions.
https://arxiv.org/abs/2412.01692
Artificial Intelligence (AI) is transforming diverse societal domains, raising critical questions about its risks and benefits and the misalignments between public expectations and academic visions. This study examines how the general public (N=1110) -- people using or being affected by AI -- and academic AI experts (N=119) -- people shaping AI development -- perceive AI's capabilities and impact across 71 scenarios, including sustainability, healthcare, job performance, societal divides, art, and warfare. Participants evaluated each scenario on four dimensions: expected probability, perceived risk and benefit, and overall sentiment (or value). The findings reveal significant quantitative differences: experts anticipate higher probabilities, perceive lower risks, report greater utility, and express more favorable sentiment toward AI compared to the non-experts. Notably, risk-benefit tradeoffs differ: the public assigns risk half the weight of benefits, while experts assign it only a third. Visual maps of these evaluations highlight areas of convergence and divergence, identifying potential sources of public concern. These insights offer actionable guidance for researchers and policymakers to align AI development with societal values, fostering public trust and informed governance.
https://arxiv.org/abs/2412.01459
Deep learning has achieved remarkable success in processing and managing unstructured data. However, its "black box" nature imposes significant limitations, particularly in sensitive application domains. While existing interpretable machine learning methods address some of these issues, they often fail to adequately consider feature correlations and provide insufficient evaluation of model decision paths. To overcome these challenges, this paper introduces Real Explainer (RealExp), an interpretability computation method that decouples the Shapley Value into individual feature importance and feature correlation importance. By incorporating feature similarity computations, RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions, leading to more reliable and nuanced explanations. Additionally, this paper proposes a novel interpretability evaluation criterion focused on elucidating the decision paths of deep learning models, going beyond traditional accuracy-based metrics. Experimental validations on two unstructured data tasks -- image classification and text sentiment analysis -- demonstrate that RealExp significantly outperforms existing methods in interpretability. Case studies further illustrate its practical value: in image classification, RealExp aids in selecting suitable pre-trained models for specific tasks from an interpretability perspective; in text classification, it enables the optimization of models and approximates the performance of a fine-tuned GPT-Ada model using traditional bag-of-words approaches.
https://arxiv.org/abs/2412.01365
As a fine-grained task, multimodal aspect-based sentiment analysis (MABSA) mainly focuses on identifying aspect-level sentiment information in the text-image pair. However, we observe that it is difficult to recognize the sentiment of aspects in low-quality samples, such as those with low-resolution images that tend to contain noise. And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty. But previous works for the MABSA task treat different quality samples with the same importance and ignored the influence of data uncertainty. In this paper, we propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA, which weighted the loss of different samples by the data quality and difficulty. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance, thus enabling the model to pay more attention to high-quality and challenging samples. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset. Further analysis demonstrates the effectiveness of the quality assessment strategy.
https://arxiv.org/abs/2412.01249
Recently, generative pre-training based models have demonstrated remarkable results on Aspect-based Sentiment Analysis (ABSA) task. However, previous works overemphasize crafting various templates to paraphrase training targets for enhanced decoding, ignoring the internal optimizations on generative models. Despite notable results achieved by these target-oriented optimization methods, they struggle with the complicated long texts since the implicit long-distance relation, e.g., aspect-opinion relation, is difficult to extract under the position embedding mechanism in generative models. Thus, in this paper, we first clarify the causes of the problem and introduce two sequence optimization strategies: the rule-based static optimization and the score-based dynamic optimization. The rule-based approach relies on handcraft priority of dependency relation to reorder the context, while the score-based algorithm dynamically regulates the contextual sequence by calculating word position scores using neural network. Based on the dynamic optimization structure, we further propose a unified Prompt-based Generative Sequence Optimization network (named PGSO), which jointly optimizes the training target as well as the generative model. Specifically, PGSO contains two components, namely, prompt construction and sequence regulator. The former constructs a task-specific prompt based on unsupervised training objects to fully utilize the pre-trained model. The latter jointly leverages semantic, syntactic and original-sequence information to dynamically regulate contextual sequence. Our experiments conducted on four ABSA tasks across multiple benchmarks indicate that PGSO outperforms state-of-the-art methods, with an average improvement of 3.52% in F1 score.
https://arxiv.org/abs/2412.00763
Multimodal sarcasm detection (MSD) is essential for various downstream tasks. Existing MSD methods tend to rely on spurious correlations. These methods often mistakenly prioritize non-essential features yet still make correct predictions, demonstrating poor generalizability beyond training environments. Regarding this phenomenon, this paper undertakes several initiatives. Firstly, we identify two primary causes that lead to the reliance of spurious correlations. Secondly, we address these challenges by proposing a novel method that integrate Multimodal Incongruities via Contrastive Learning (MICL) for multimodal sarcasm detection. Specifically, we first leverage incongruity to drive multi-view learning from three views: token-patch, entity-object, and sentiment. Then, we introduce extensive data augmentation to mitigate the biased learning of the textual modality. Additionally, we construct a test set, SPMSD, which consists potential spurious correlations to evaluate the the model's generalizability. Experimental results demonstrate the superiority of MICL on benchmark datasets, along with the analyses showcasing MICL's advancement in mitigating the effect of spurious correlation.
https://arxiv.org/abs/2412.00756
Anti-Muslim hate speech has emerged within memes, characterized by context-dependent and rhetorical messages using text and images that seemingly mimic humor but convey Islamophobic sentiments. This work presents a novel dataset and proposes a classifier based on the Vision-and-Language Transformer (ViLT) specifically tailored to identify anti-Muslim hate within memes by integrating both visual and textual representations. Our model leverages joint modal embeddings between meme images and incorporated text to capture nuanced Islamophobic narratives that are unique to meme culture, providing both high detection accuracy and interoperability.
https://arxiv.org/abs/2412.00681
Aspect-Opinion Pair Extraction (AOPE) and Aspect Sentiment Triplet Extraction (ASTE) have gained significant attention in natural language processing. However, most existing methods are a pipelined framework, which extracts aspects/opinions and identifies their relations separately, leading to a drawback of error propagation and high time complexity. Towards this problem, we propose a transition-based pipeline to mitigate token-level bias and capture position-aware aspect-opinion relations. With the use of a fused dataset and contrastive learning optimization, our model learns robust action patterns and can optimize separate subtasks jointly, often with linear-time complexity. The results show that our model achieves the best performance on both the ASTE and AOPE tasks, outperforming the state-of-the-art methods by at least 6.98\% in the F1 measure. The code is available at this https URL.
https://arxiv.org/abs/2412.00208
Accurate prediction of stock market trends is crucial for informed investment decisions and effective portfolio management, ultimately leading to enhanced wealth creation and risk mitigation. This study proposes a novel approach for predicting stock prices in the stock market by integrating Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, using sentiment analysis of social network data and candlestick data (price). The proposed methodology consists of two primary components: sentiment analysis of social network and candlestick data. By amalgamating candlestick data with insights gleaned from Twitter, this approach facilitates a more detailed and accurate examination of market trends and patterns, ultimately leading to more effective stock price predictions. Additionally, a Random Forest algorithm is used to classify tweets as either positive or negative, allowing for a more subtle and informed assessment of market sentiment. This study uses CNN and LSTM networks to predict stock prices. The CNN extracts short-term features, while the LSTM models long-term dependencies. The integration of both networks enables a more comprehensive analysis of market trends and patterns, leading to more accurate stock price predictions.
https://arxiv.org/abs/2411.19766
The complexity of stocks and industries presents challenges for stock prediction. Currently, stock prediction models can be divided into two categories. One category, represented by GRU and ALSTM, relies solely on stock factors for prediction, with limited effectiveness. The other category, represented by HIST and TRA, incorporates not only stock factors but also industry information, industry financial reports, public sentiment, and other inputs for prediction. The second category of models can capture correlations between stocks by introducing additional information, but the extra data is difficult to standardize and generalize. Considering the current state and limitations of these two types of models, this paper proposes the GRU-PFG (Project Factors into Graph) model. This model only takes stock factors as input and extracts inter-stock correlations using graph neural networks. It achieves prediction results that not only outperform the others models relies solely on stock factors, but also achieve comparable performance to the second category models. The experimental results show that on the CSI300 dataset, the IC of GRU-PFG is 0.134, outperforming HIST's 0.131 and significantly surpassing GRU and Transformer, achieving results better than the second category models. Moreover as a model that relies solely on stock factors, it has greater potential for generalization.
https://arxiv.org/abs/2411.18997
Thirteen years after the Fukushima Daiichi nuclear power plant accident, Japan's nuclear energy accounts for only approximately 6% of electricity production, as most nuclear plants remain shut down. To revitalize the nuclear industry and achieve sustainable development goals, effective communication with Japanese citizens, grounded in an accurate understanding of public sentiment, is of paramount importance. While nationwide surveys have traditionally been used to gauge public views, the rise of social media in recent years has provided a promising new avenue for understanding public sentiment. To explore domestic sentiment on nuclear energy-related issues expressed online, we analyzed the content and comments of over 3,000 YouTube videos covering topics related to nuclear energy. Topic modeling was used to extract the main topics from the videos, and sentiment analysis with large language models classified user sentiments towards each topic. Additionally, word co-occurrence network analysis was performed to examine the shift in online discussions during August and September 2023 regarding the release of treated water. Overall, our results provide valuable insights into the online discourse on nuclear energy and contribute to a more comprehensive understanding of public sentiment in Japan.
福岛第一核电站事故发生13年后,日本的核能发电仅占总电力生产的约6%,因为大多数核电站仍然处于关闭状态。为了振兴核工业并实现可持续发展目标,与日本公民进行有效沟通,建立在准确理解公众情绪的基础上,是至关重要的。虽然传统上全国范围内的调查被用来衡量公众观点,但近年来社交媒体的兴起为了解公众情绪提供了一个有希望的新途径。为了探索在线表达的关于核电相关问题的国内情绪,我们分析了超过3,000个涵盖核能话题的YouTube视频的内容和评论。通过主题建模从视频中提取主要话题,并使用大型语言模型进行情感分析来分类用户对每个话题的情绪。此外,还进行了词共现网络分析,以检查2023年8月和9月期间关于处理水释放的在线讨论的变化。总体而言,我们的结果为核能相关的线上讨论提供了有价值的见解,并有助于更全面地理解日本公众情绪。
https://arxiv.org/abs/2411.18383
With strong expressive capabilities in Large Language Models(LLMs), generative models effectively capture sentiment structures and deep semantics, however, challenges remain in fine-grained sentiment classification across multi-lingual and complex contexts. To address this, we propose the Sentiment Cross-Lingual Recognition and Logic Framework (SentiXRL), which incorporates two modules,an emotion retrieval enhancement module to improve sentiment classification accuracy in complex contexts through historical dialogue and logical reasoning,and a self-circulating analysis negotiation mechanism (SANM)to facilitates autonomous decision-making within a single model for classification this http URL have validated SentiXRL's superiority on multiple standard datasets, outperforming existing models on CPED and CH-SIMS,and achieving overall better performance on MELD,Emorynlp and IEMOCAP. Notably, we unified labels across several fine-grained sentiment annotation datasets and conducted category confusion experiments, revealing challenges and impacts of class imbalance in standard datasets.
强大的语言模型(LLMs)在表达能力方面表现出色,生成型模型能够有效捕捉情感结构和深层语义。然而,在多语言和复杂背景下进行细粒度情感分类仍存在挑战。为此,我们提出了情感跨语言识别与逻辑框架(SentiXRL),该框架包含两个模块:情感检索增强模块通过历史对话和逻辑推理来提高复杂背景下的情感分类准确性;自我循环分析谈判机制(SANM)促进单个模型内部的自主决策以进行分类。我们在多个标准数据集上验证了SentiXRL的优势,它在CPED和CH-SIMS上的表现超过了现有模型,并且在MELD、EmoryNLP和IEMOCAP上也表现出总体更好的性能。值得注意的是,我们统一了几种细粒度情感标注数据集的标签并进行了类别混淆实验,揭示了标准数据集中类别不平衡带来的挑战及其影响。
https://arxiv.org/abs/2411.18162
Natural Language Processing (NLP) for low-resource languages presents significant challenges, particularly due to the scarcity of high-quality annotated data and linguistic resources. The choice of embeddings plays a critical role in enhancing the performance of NLP tasks, such as news classification, sentiment analysis, and hate speech detection, especially for low-resource languages like Marathi. In this study, we investigate the impact of various embedding techniques- Contextual BERT-based, Non-Contextual BERT-based, and FastText-based on NLP classification tasks specific to the Marathi language. Our research includes a thorough evaluation of both compressed and uncompressed embeddings, providing a comprehensive overview of how these embeddings perform across different scenarios. Specifically, we compare two BERT model embeddings, Muril and MahaBERT, as well as two FastText model embeddings, IndicFT and MahaFT. Our evaluation includes applying embeddings to a Multiple Logistic Regression (MLR) classifier for task performance assessment, as well as TSNE visualizations to observe the spatial distribution of these embeddings. The results demonstrate that contextual embeddings outperform non-contextual embeddings. Furthermore, BERT-based non-contextual embeddings extracted from the first BERT embedding layer yield better results than FastText-based embeddings, suggesting a potential alternative to FastText embeddings.
自然语言处理(NLP)在低资源语言方面面临显著挑战,特别是由于高质量标注数据和语言资源的稀缺。嵌入选择在提升诸如新闻分类、情感分析和仇恨言论检测等NLP任务性能中扮演着关键角色,尤其对于像马拉地语这样的低资源语言而言。在这项研究中,我们调查了不同嵌入技术——基于上下文BERT的、非上下文BERT的以及基于FastText的嵌入对特定于马拉地语的NLP分类任务的影响。我们的研究包括压缩和未压缩嵌入的全面评估,提供了这些嵌入在不同场景下的表现概述。具体而言,我们比较了两种BERT模型嵌入——Muril和MahaBERT,以及两种FastText模型嵌入——IndicFT和MahaFT。我们的评估涵盖了将这些嵌入应用于多重逻辑回归(MLR)分类器以进行任务性能评估,还包括使用TSNE可视化来观察这些嵌入的空间分布。结果表明,上下文嵌入优于非上下文嵌入。此外,从BERT嵌入层提取的基于非上下文BERT的嵌入比基于FastText的嵌入表现更好,这可能为替代FastText嵌入提供了一种潜在方案。
https://arxiv.org/abs/2411.17661
Low-resource languages face significant challenges due to the lack of sufficient linguistic data, resources, and tools for tasks such as supervised learning, annotation, and classification. This shortage hinders the development of accurate models and datasets, making it difficult to perform critical NLP tasks like sentiment analysis or hate speech detection. To bridge this gap, Large Language Models (LLMs) present an opportunity for potential annotators, capable of generating datasets and resources for these underrepresented languages. In this paper, we focus on Marathi, a low-resource language, and evaluate the performance of both closed-source and open-source LLMs as annotators. We assess models such as GPT-4o and Gemini 1.0 Pro, Gemma 2 (2B and 9B), and Llama 3.1 (8B) on classification tasks including sentiment analysis, news classification, and hate speech detection. Our findings reveal that while LLMs excel in annotation tasks for high-resource languages like English, they still fall short when applied to Marathi. Even advanced closed models like Gemini and GPT underperform in comparison to BERT-based baselines, highlighting the limitations of LLMs as annotators for low-resource languages.
低资源语言面临着显著的挑战,由于缺乏足够的语言数据、资源和工具来完成诸如监督学习、标注和分类等任务。这种短缺阻碍了准确模型和数据集的发展,使得执行关键的自然语言处理任务(如情感分析或仇恨言论检测)变得困难。为了填补这一空白,大型语言模型(LLMs)为潜在的标注者提供了机会,能够生成这些代表性不足的语言的数据集和资源。在本文中,我们专注于马拉地语这种低资源语言,并评估了闭源和开源LLM作为标注者的性能。我们在分类任务上测试了几种模型,包括GPT-4o、Gemini 1.0 Pro、Gemma 2(2B和9B)以及Llama 3.1(8B),这些任务涵盖了情感分析、新闻分类和仇恨言论检测。我们的发现表明,虽然LLMs在诸如英语这样的高资源语言的标注任务上表现出色,但它们应用于马拉地语时仍显不足。即使是最先进的闭源模型如Gemini和GPT,其性能也低于基于BERT的基础线模型,这突显了LLMs作为低资源语言标注者的局限性。
https://arxiv.org/abs/2411.17637
Detecting user frustration in modern-day task-oriented dialog (TOD) systems is imperative for maintaining overall user satisfaction, engagement, and retention. However, most recent research is focused on sentiment and emotion detection in academic settings, thus failing to fully encapsulate implications of real-world user data. To mitigate this gap, in this work, we focus on user frustration in a deployed TOD system, assessing the feasibility of out-of-the-box solutions for user frustration detection. Specifically, we compare the performance of our deployed keyword-based approach, open-source approaches to sentiment analysis, dialog breakdown detection methods, and emerging in-context learning LLM-based detection. Our analysis highlights the limitations of open-source methods for real-world frustration detection, while demonstrating the superior performance of the LLM-based approach, achieving a 16\% relative improvement in F1 score on an internal benchmark. Finally, we analyze advantages and limitations of our methods and provide an insight into user frustration detection task for industry practitioners.
检测现代任务导向对话(TOD)系统中的用户挫败感对于保持整体用户满意度、参与度和留存率至关重要。然而,最近的大部分研究都集中在学术环境下的情感和情绪识别上,未能完全涵盖现实世界用户数据的影响。为了解决这一差距,在这项工作中,我们专注于部署在TOD系统中的用户挫败感问题,并评估现成解决方案用于检测用户挫败感的可行性。具体来说,我们将已部署的关键字方法、开源的情感分析方法、对话中断检测方法以及新兴的情境学习LLM(大型语言模型)为基础的检测方法进行了比较。我们的分析显示了开源方法在现实世界挫败感检测中的局限性,并展示了基于LLM的方法的优越性能,在内部基准测试中实现了16%的F1分数相对提升。最后,我们分析了这些方法的优势和限制,并为行业从业者提供了有关用户挫败感检测任务的见解。
https://arxiv.org/abs/2411.17437
Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance in multi-object sentiment analysis, a key task in semantic understanding. To address this gap, we introduce MOSABench, a novel evaluation dataset designed specifically for multi-object sentiment analysis. MOSABench includes approximately 1,000 images with multiple objects, requiring MLLMs to independently assess the sentiment of each object, thereby reflecting real-world complexities. Key innovations in MOSABench include distance-based target annotation, post-processing for evaluation to standardize outputs, and an improved scoring mechanism. Our experiments reveal notable limitations in current MLLMs: while some models, like mPLUG-owl and Qwen-VL2, demonstrate effective attention to sentiment-relevant features, others exhibit scattered focus and performance declines, especially as the spatial distance between objects increases. This research underscores the need for MLLMs to enhance accuracy in complex, multi-object sentiment analysis tasks and establishes MOSABench as a foundational tool for advancing sentiment analysis capabilities in MLLMs.
https://arxiv.org/abs/2412.00060
In this paper, we apply a method to quantify biases associated with named entities from various countries. We create counterfactual examples with small perturbations on target-domain data instead of relying on templates or specific datasets for bias detection. On widely used classifiers for subjectivity analysis, including sentiment, emotion, hate speech, and offensive text using Twitter data, our results demonstrate positive biases related to the language spoken in a country across all classifiers studied. Notably, the presence of certain country names in a sentence can strongly influence predictions, up to a 23\% change in hate speech detection and up to a 60\% change in the prediction of negative emotions such as anger. We hypothesize that these biases stem from the training data of pre-trained language models (PLMs) and find correlations between affect predictions and PLMs likelihood in English and unknown languages like Basque and Maori, revealing distinct patterns with exacerbate correlations. Further, we followed these correlations in-between counterfactual examples from a same sentence to remove the syntactical component, uncovering interesting results suggesting the impact of the pre-training data was more important for English-speaking-country names. Our anonymized code is [this https URL](available here).
在这篇论文中,我们应用了一种方法来量化与来自不同国家的命名实体相关的偏见。我们创建了对目标领域数据进行小幅度扰动的反事实示例,而不是依赖于模板或特定的数据集来进行偏见检测。在广泛用于主观性分析的分类器上(包括使用Twitter数据的情感、情绪、仇恨言论和冒犯性文本),我们的结果显示,在所有研究的分类器中,与国家所使用的语言相关的存在积极偏见。值得注意的是,句子中某些国家名称的存在可以强烈影响预测结果,最高可导致23%的仇恨言论检测变化,以及高达60%的负面情绪(如愤怒)预测变化。我们假设这些偏见源于预训练语言模型(PLMs)的训练数据,并在英语和其他未知语言(如巴斯克语和毛利语)的情感预测与PLMs的可能性之间发现了相关性,揭示了加剧相关性的不同模式。进一步地,我们在同一句子产生的反事实示例间追踪这些相关性,去除语法成分后发现了一些有趣的结果,表明预训练数据对讲英语国家名称的影响更为重要。我们的匿名代码可在[此链接](此处提供)获取。
https://arxiv.org/abs/2407.01834
In the field of deep learning, Graph Neural Networks (GNNs) and Graph Transformer models, with their outstanding performance and flexible architectural designs, have become leading technologies for processing structured data, especially graph data. Traditional GNNs often face challenges in capturing information from distant vertices effectively. In contrast, Graph Transformer models are particularly adept at managing long-distance node relationships. Despite these advantages, Graph Transformer models still encounter issues with computational and storage efficiency when scaled to large graph datasets. To address these challenges, we propose an innovative Graph Neural Network (GNN) architecture that integrates a Top-m attention mechanism aggregation component and a neighborhood aggregation component, effectively enhancing the model's ability to aggregate relevant information from both local and extended neighborhoods at each layer. This method not only improves computational efficiency but also enriches the node features, facilitating a deeper analysis of complex graph structures. Additionally, to assess the effectiveness of our proposed model, we have applied it to citation sentiment prediction, a novel task previously unexplored in the GNN field. Accordingly, we constructed a dedicated citation network, ArXivNet. In this dataset, we specifically annotated the sentiment polarity of the citations (positive, neutral, negative) to enable in-depth sentiment analysis. Our approach has shown superior performance across a variety of tasks including vertex classification, link prediction, sentiment prediction, graph regression, and visualization. It outperforms existing methods in terms of effectiveness, as demonstrated by experimental results on multiple datasets.
在深度学习领域,图神经网络(GNNs)和图转换模型因其卓越的性能和灵活的架构设计,在处理结构化数据,特别是图数据方面,已成为领先技术。传统的GNN通常面临有效捕获远距离节点信息的挑战。相比之下,图转换模型特别擅长管理长距离节点关系。尽管有这些优势,当扩展到大规模图形数据集时,图转换模型仍会遇到计算和存储效率的问题。为了解决这些问题,我们提出了一种创新的图神经网络(GNN)架构,该架构集成了Top-m注意力机制聚合组件和邻居聚合组件,在每一层有效地增强了模型从局部和扩展邻域聚集相关信息的能力。这种方法不仅提高了计算效率,还丰富了节点特征,促进了复杂图形结构的深入分析。此外,为了评估我们提出的模型的有效性,我们将它应用于引文情感预测这一GNN领域以前未探索的新任务中。为此,我们构建了一个专门的引用网络ArXivNet。在这个数据集中,我们特别标注了引文的情感极性(正面、中立、负面),以实现深入的情感分析。我们的方法在顶点分类、链接预测、情感预测、图回归和可视化等多种任务上表现出了优越性能,并通过多个数据集上的实验结果证明其有效性优于现有方法。
https://arxiv.org/abs/2411.15458