Variance, Self-Consistency, and Arbitrariness in Fair Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

In fair classification, it is common to train a model, and to compare and correct subgroup-specific error rates for disparities. However, even if a model's classification decisions satisfy a fairness metric, it is not necessarily the case that these decisions are equally confident. This becomes clear if we measure variance: We can fix everything in the learning process except the subset of training data, train multiple models, measure (dis)agreement in predictions for each test example, and interpret disagreement to mean that the learning process is more unstable with respect to its classification decision. Empirically, some decisions can in fact be so unstable that they are effectively arbitrary. To reduce this arbitrariness, we formalize a notion of self-consistency of a learning process, develop an ensembling algorithm that provably increases self-consistency, and empirically demonstrate its utility to often improve both fairness and accuracy. Further, our evaluation reveals a startling observation: Applying ensembling to common fair classification benchmarks can significantly reduce subgroup error rate disparities, without employing common pre-, in-, or post-processing fairness interventions. Taken together, our results indicate that variance, particularly on small datasets, can muddle the reliability of conclusions about fairness. One solution is to develop larger benchmark tasks. To this end, we release a toolkit that makes the Home Mortgage Disclosure Act datasets easily usable for future research.

Abstract (translated)

在公正分类中,训练模型是一种常见的方法,并且可以比较和纠正特定子群体的错误率差异。然而,即使模型的分类决策符合公正度量标准,也不一定意味着这些决策都充满信心。如果我们测量差异:我们可以在学习过程中添加所有其他训练数据的部分,训练多个模型,对每个测试例子的预测进行预测差异测量,并解释差异,以表示学习过程在分类决策方面更为不稳定。实际上,有些决策可能实际上非常不稳定,它们实际上变得主观。为了减少这种主观性,我们 formalize 一个学习过程的一致性概念,开发一个可证明可以增加一致性的聚合算法,并 empirical 证明其常常可以改善公正性和准确性。进一步,我们的评估揭示了一个令人震惊的观察:将聚合应用于常见的公正分类基准可以显著降低特定子群体的错误率差异,而无需使用常见的预处理公正干预。综合起来,我们的结果表明,差异,特别是在小型数据集上,可能会混淆关于公正的可靠性结论。一种解决方法是开发更大的基准任务。为此,我们发布了一个工具集,使家庭 mortgage背债披露法案数据可以轻松用于未来的研究。

URL

https://arxiv.org/abs/2301.11562

PDF

https://arxiv.org/pdf/2301.11562.pdf