Paper Reading AI Learner

Variance, Self-Consistency, and Arbitrariness in Fair Classification

2023-01-27 06:52:04
A. Feder Cooper, Solon Barocas, Christopher De Sa, Siddhartha Sen

Abstract

In fair classification, it is common to train a model, and to compare and correct subgroup-specific error rates for disparities. However, even if a model's classification decisions satisfy a fairness metric, it is not necessarily the case that these decisions are equally confident. This becomes clear if we measure variance: We can fix everything in the learning process except the subset of training data, train multiple models, measure (dis)agreement in predictions for each test example, and interpret disagreement to mean that the learning process is more unstable with respect to its classification decision. Empirically, some decisions can in fact be so unstable that they are effectively arbitrary. To reduce this arbitrariness, we formalize a notion of self-consistency of a learning process, develop an ensembling algorithm that provably increases self-consistency, and empirically demonstrate its utility to often improve both fairness and accuracy. Further, our evaluation reveals a startling observation: Applying ensembling to common fair classification benchmarks can significantly reduce subgroup error rate disparities, without employing common pre-, in-, or post-processing fairness interventions. Taken together, our results indicate that variance, particularly on small datasets, can muddle the reliability of conclusions about fairness. One solution is to develop larger benchmark tasks. To this end, we release a toolkit that makes the Home Mortgage Disclosure Act datasets easily usable for future research.

Abstract (translated)

在公正分类中,训练模型是一种常见的方法,并且可以比较和纠正特定子群体的错误率差异。然而,即使模型的分类决策符合公正度量标准,也不一定意味着这些决策都充满信心。如果我们测量差异:我们可以在学习过程中添加所有其他训练数据的部分,训练多个模型,对每个测试例子的预测进行预测差异测量,并解释差异,以表示学习过程在分类决策方面更为不稳定。实际上,有些决策可能实际上非常不稳定,它们实际上变得主观。为了减少这种主观性,我们 formalize 一个学习过程的一致性概念,开发一个可证明可以增加一致性的聚合算法,并 empirical 证明其常常可以改善公正性和准确性。进一步,我们的评估揭示了一个令人震惊的观察:将聚合应用于常见的公正分类基准可以显著降低特定子群体的错误率差异,而无需使用常见的预处理公正干预。综合起来,我们的结果表明,差异,特别是在小型数据集上,可能会混淆关于公正的可靠性结论。一种解决方法是开发更大的基准任务。为此,我们发布了一个工具集,使家庭 mortgage背债披露法案数据可以轻松用于未来的研究。

URL

https://arxiv.org/abs/2301.11562

PDF

https://arxiv.org/pdf/2301.11562.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot