Abstract
In this work, we propose a novel tree-based explanation technique, PEACH (Pretrained-embedding Explanation Across Contextual and Hierarchical Structure), that can explain how text-based documents are classified by using any pretrained contextual embeddings in a tree-based human-interpretable manner. Note that PEACH can adopt any contextual embeddings of the PLMs as a training input for the decision tree. Using the proposed PEACH, we perform a comprehensive analysis of several contextual embeddings on nine different NLP text classification benchmarks. This analysis demonstrates the flexibility of the model by applying several PLM contextual embeddings, its attribute selections, scaling, and clustering methods. Furthermore, we show the utility of explanations by visualising the feature selection and important trend of text classification via human-interpretable word-cloud-based trees, which clearly identify model mistakes and assist in dataset debugging. Besides interpretability, PEACH outperforms or is similar to those from pretrained models.
Abstract (translated)
在这项工作中,我们提出了一种新颖的基于树的解释技术,称为PEACH(预训练嵌入解释跨上下文和层次结构),可以在基于树的整个人类可解释方式中解释文本文档的分类。请注意,PEACH可以采用任何PLM的上下文嵌入作为训练输入来训练决策树。通过使用提出的PEACH,我们对九个不同的自然语言处理文本分类基准进行了全面分析。这种分析证明了模型的灵活性,通过应用多个PLM的上下文嵌入、属性选择、缩放和聚类方法,对其进行了分析。此外,我们通过可视化通过人类可解释的词云树来展示解释的有用性,该树清楚地指出了模型的错误,并有助于数据集的调试。除了可解释性之外,PEACH超越或与预训练模型相当。
URL
https://arxiv.org/abs/2404.13645