Abstract
The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.
Abstract (translated)
Black-box算法的复杂性可能导致各种挑战,包括引入偏差。这些偏差在算法应用中表现为即时风险。例如,已经证明,仅从患者X光片中推断出种族信息,超出了医学专家的能力范围。如果这个事实不知道给定医学专家,基于这个算法的自动决策可能导致根据种族信息进行(纯粹)治疗。尽管现有的方法允许将神经网络与这种信息进行“正交化”或“归一化”,但现有方法基于线性模型。我们的论文通过引入非线性性的修正来推动对话。我们的方法还包括有标量的和张量值预测,有助于将其集成到神经网络架构中。通过广泛的实验,我们验证了我们的方法在保护泛化线性模型中敏感数据的有效性,以及为元数据对齐卷积神经网络进行归一化和纠正预先存在的嵌入。
URL
https://arxiv.org/abs/2405.02475