Abstract
Gradient-based explanation methods play an important role in the field of interpreting complex deep neural networks for NLP models. However, the existing work has shown that the gradients of a model are unstable and easily manipulable, which impacts the model's reliability largely. According to our preliminary analyses, we also find the interpretability of gradient-based methods is limited for complex tasks, such as aspect-based sentiment classification (ABSC). In this paper, we propose an \textbf{I}nterpretation-\textbf{E}nhanced \textbf{G}radient-based framework for \textbf{A}BSC via a small number of explanation annotations, namely \texttt{IEGA}. Particularly, we first calculate the word-level saliency map based on gradients to measure the importance of the words in the sentence towards the given aspect. Then, we design a gradient correction module to enhance the model's attention on the correct parts (e.g., opinion words). Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks. Comprehensive experimental results on four benchmark datasets show that our \texttt{IEGA} can improve not only the interpretability of the model but also the performance and robustness.
Abstract (translated)
梯度解释方法在处理自然语言处理模型时扮演了重要的角色。然而,现有的研究已经表明,模型的梯度不稳定且容易操纵,这严重影响了模型的可靠性。根据我们的初步分析,我们还发现对于复杂的任务,如基于 aspect 的情感分类(ABSC),梯度解释方法的可解释性是有限的。在本文中,我们提出了一个基于梯度的优化框架,名为 ABSC 解释框架(IEGA),通过少量的解释标注,即 exttt{IEGA},来实现 ABSC 任务。特别地,我们首先通过梯度计算单词级别的局部响应图,以测量句子中单词对于给定 aspect 的重要性。然后,我们设计了一个梯度修正模块,以提高模型对正确部分(如意见词)的关注。我们的模型是无模型偏好和任务偏好的,因此它可以与现有的 ABSC 方法或其他任务集成。对四个基准数据集的全面实验结果表明,我们的 exttt{IEGA} 不仅可以提高模型的可解释性,还可以提高性能和鲁棒性。
URL
https://arxiv.org/abs/2302.10479