Paper Reading AI Learner

Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems

2024-01-23 21:22:06
Michelle R. Greene, Mariam Josyula, Wentao Si, Jennifer A. Hart

Abstract

Computer-based scene understanding has influenced fields ranging from urban planning to autonomous vehicle performance, yet little is known about how well these technologies work across social differences. We investigate the biases of deep convolutional neural networks (dCNNs) in scene classification, using nearly one million images from global and US sources, including user-submitted home photographs and Airbnb listings. We applied statistical models to quantify the impact of socioeconomic indicators such as family income, Human Development Index (HDI), and demographic factors from public data sources (CIA and US Census) on dCNN performance. Our analyses revealed significant socioeconomic bias, where pretrained dCNNs demonstrated lower classification accuracy, lower classification confidence, and a higher tendency to assign labels that could be offensive when applied to homes (e.g., "ruin", "slum"), especially in images from homes with lower socioeconomic status (SES). This trend is consistent across two datasets of international images and within the diverse economic and racial landscapes of the United States. This research contributes to understanding biases in computer vision, emphasizing the need for more inclusive and representative training datasets. By mitigating the bias in the computer vision pipelines, we can ensure fairer and more equitable outcomes for applied computer vision, including home valuation and smart home security systems. There is urgency in addressing these biases, which can significantly impact critical decisions in urban development and resource allocation. Our findings also motivate the development of AI systems that better understand and serve diverse communities, moving towards technology that equitably benefits all sectors of society.

Abstract (translated)

基于计算机的场景理解已经影响了包括城市规划、自动驾驶汽车性能在内的各个领域,然而目前还很少有人知道这些技术在不同社会差异下的表现。我们研究了深度卷积神经网络(dCNNs)在场景分类中的偏见,使用了来自全球和美国的近100万张图像,包括用户提交的家居照片和Airbnb列表。我们应用统计模型来量化来自公共数据源(CIA和美国人口普查局)的社会经济发展指标(如家庭收入、人类发展指数,人口统计因素)对dCNN性能的影响。我们的分析揭示了显著的社会经济偏见,即预训练的dCNNs在分类准确性、分类信心和将标签分配给可能冒犯住宅的倾向方面都较低(例如,“脏乱”、“贫民窟”等),特别是在社会经济地位较低的住宅(SES)中。这一趋势在两个国际图像数据集和美国的多样经济和种族景观中都是一致的。这项研究为计算机视觉中的偏见提供了更深入的理解,强调了需要创建更包容和代表性训练数据集的必要性。通过减轻计算机视觉流程中的偏见,我们可以确保应用计算机视觉获得更公平和平等的结果,包括住宅估值和智能家居安全系统。解决这些偏见的问题具有紧迫性,这可能会对城市发展和资源分配产生重大影响。我们的研究还推动了开发更了解和服务于多样社区的AI系统,朝着实现让所有社会各阶层都受益的技术方向发展。

URL

https://arxiv.org/abs/2401.13097

PDF

https://arxiv.org/pdf/2401.13097.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot