$PredDiff$: Explanations and Interactions from Conditional Expectations

Abstract
Abstract (translated)
URL
PDF

Abstract

$PredDiff$ is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes when marginalizing out feature variables. In this work, we clarify properties of $PredDiff$ and put forward several extensions of the original formalism. Most notably, we introduce a new measure for interaction effects. Interactions are an inevitable step towards a comprehensive understanding of black-box models. Importantly, our framework readily allows to investigate interactions between arbitrary feature subsets and scales linearly with their number. We demonstrate the soundness of $PredDiff$ relevances and interactions both in the classification and regression setting. To this end, we use different analytic, synthetic and real-world datasets.

Abstract (translated)

URL

https://arxiv.org/abs/2102.13519

PDF

https://arxiv.org/pdf/2102.13519.pdf