Large Language Models as Counterfactual Generator: Strengths and Weaknesses

Abstract
Abstract (translated)
URL
PDF

Abstract

Large language models (LLMs) have demonstrated remarkable performance in a range of natural language understanding and generation tasks. Yet, their ability to generate counterfactuals, which can be used for areas like data augmentation, remains under-explored. This study aims to investigate the counterfactual generation capabilities of LLMs and analysis factors that influence this ability. First, we evaluate how effective are LLMs in counterfactual generation through data augmentation experiments for small language models (SLMs) across four tasks: sentiment analysis, natural language inference, named entity recognition, and relation extraction. While LLMs show promising enhancements in various settings, they struggle in complex tasks due to their self-limitations and the lack of logical guidance to produce counterfactuals that align with commonsense. Second, our analysis reveals the pivotal role of providing accurate task definitions and detailed step-by-step instructions to LLMs in generating counterfactuals. Interestingly, we also find that LLMs can generate reasonable counterfactuals even with unreasonable demonstrations, which illustrates that demonstrations are primarily to regulate the output format.This study provides the first comprehensive insight into counterfactual generation abilities of LLMs, and offers a novel perspective on utilizing LLMs for data augmentation to enhance SLMs.

Abstract (translated)

大型语言模型(LLMs)在自然语言理解和生成任务中表现出了卓越的表现。然而，他们生成 counterfactuals 的能力，可以用于数据增强等领域，仍然未被充分探索。本研究旨在研究LLMs的 counterfactual generation 能力以及影响 this 能力的分析因素。首先，我们通过为小型语言模型(SLMs)在不同任务中进行数据增强实验，评估了LLMs在 counterfactual 生成方面的效能。虽然在各种环境下LLMs 表现出良好的增强效果，但由于自身的限制和缺乏逻辑指导，无法生成符合常识的 counterfactuals。其次，我们的分析揭示了提供准确任务定义和详细步骤指示对LLMs生成 counterfactuals 的关键作用。有趣的是，我们还发现，即使提供不合理的演示，LLMs 仍然能够生成合理的 counterfactuals，这表明演示的主要作用是规范输出格式。本研究提供了对LLMs counterfactual generation 能力的全面认识，并提供了一个利用LLMs进行数据增强以增强小语言模型的新视角。

URL

https://arxiv.org/abs/2305.14791

PDF

https://arxiv.org/pdf/2305.14791.pdf