Paper Reading AI Learner

LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study

2024-04-26 11:57:21
Van Bach Nguyen, Paul Youssef, Jörg Schlötterer, Christin Seifert

Abstract

As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.

Abstract (translated)

随着自然语言处理(NLP)模型的日益复杂,理解它们的决策变得越来越重要。反事实(CF)是一种方法,通过对输入进行最小修改来翻转模型的预测,提供了解释这些模型的方法。虽然大型自然语言处理模型(LLMs)在NLP任务中表现出色,但它们在生成高质量反事实方面的有效性仍然不确定。本文通过研究LLMs如何为两个NLU任务生成反事实来填补这一空白。我们全面比较了几个常见的LLM,并评估了它们的反事实,考虑了内生指标以及这些反事实对数据增强的影响。此外,我们分析了人类和LLM生成的反事实之间的差异,为未来的研究方向提供了洞察力。我们的研究结果表明,LLMs可以生成流畅的反事实,但在保持诱导变化最小方面遇到困难。为情感分析(SA)生成反事实比为LLM生成反事实更容易,这也反映了数据增强表现上人类和LLM生成的反事实之间存在较大差距。此外,我们评估了LLM在未标注数据环境下的反事实评估能力,并发现它们对提供标签的偏倚非常强烈。GPT4对这种偏见具有更强的抵抗力,其分数与自动指标密切相关。我们的研究结果揭示了几个局限性,并指出了潜在的未来研究方向。

URL

https://arxiv.org/abs/2405.00722

PDF

https://arxiv.org/pdf/2405.00722.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot