Abstract
Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn invariant features with respect to a protected variable (e.g., race when classifying sex from face images) or enforce CI relative to the protected attribute only on the model output (e.g., the sex label). Neither of these methods are effective in enforcing CI in high-dimensional feature spaces. In this paper, we focus on a nascent approach characterizing the CI constraint in terms of two Jensen-Shannon divergence terms, and we extend it to high-dimensional feature spaces using a novel dynamic sampling strategy. In doing so, we introduce a new training paradigm that can be applied to any encoder architecture. We are able to enforce conditional independence of the diffusion autoencoder latent representation with respect to any protected attribute under the equalized odds constraint and show that this approach enables causal image generation with controllable latent spaces. Our experimental results demonstrate that our approach can achieve high accuracy on downstream tasks while upholding equality of odds.
Abstract (translated)
条件独立性(CI)约束对于定义和评估机器学习中的公平性以及学习无偏或因果表示至关重要。传统方法要么盲目地学习与受保护变量相关的不变特征(例如,从面部图像中分类性别时,以种族为例),要么仅在模型输出上应用CI(例如,性别标签)。然而,这些方法在高维特征空间中实施CI均无效。在本文中,我们关注于一种新兴的方法,该方法以两个Jensen-Shannon熵项描述CI约束,并将其扩展到高维特征空间。通过使用一种新颖的动态采样策略,我们在等价机会约束下实现扩散自编码器潜在表示的联合独立性。我们还展示了这种方法能够实现具有可控制 latent 空间的可控因果图像生成。我们的实验结果表明,在保持等价机会的同时,我们的方法可以在下游任务上实现高准确度。
URL
https://arxiv.org/abs/2404.13798