Abstract
Computer-based scene understanding has influenced fields ranging from urban planning to autonomous vehicle performance, yet little is known about how well these technologies work across social differences. We investigate the biases of deep convolutional neural networks (dCNNs) in scene classification, using nearly one million images from global and US sources, including user-submitted home photographs and Airbnb listings. We applied statistical models to quantify the impact of socioeconomic indicators such as family income, Human Development Index (HDI), and demographic factors from public data sources (CIA and US Census) on dCNN performance. Our analyses revealed significant socioeconomic bias, where pretrained dCNNs demonstrated lower classification accuracy, lower classification confidence, and a higher tendency to assign labels that could be offensive when applied to homes (e.g., "ruin", "slum"), especially in images from homes with lower socioeconomic status (SES). This trend is consistent across two datasets of international images and within the diverse economic and racial landscapes of the United States. This research contributes to understanding biases in computer vision, emphasizing the need for more inclusive and representative training datasets. By mitigating the bias in the computer vision pipelines, we can ensure fairer and more equitable outcomes for applied computer vision, including home valuation and smart home security systems. There is urgency in addressing these biases, which can significantly impact critical decisions in urban development and resource allocation. Our findings also motivate the development of AI systems that better understand and serve diverse communities, moving towards technology that equitably benefits all sectors of society.
Abstract (translated)
基于计算机的场景理解已经影响了包括城市规划、自动驾驶汽车性能在内的各个领域,然而目前还很少有人知道这些技术在不同社会差异下的表现。我们研究了深度卷积神经网络(dCNNs)在场景分类中的偏见,使用了来自全球和美国的近100万张图像,包括用户提交的家居照片和Airbnb列表。我们应用统计模型来量化来自公共数据源(CIA和美国人口普查局)的社会经济发展指标(如家庭收入、人类发展指数,人口统计因素)对dCNN性能的影响。我们的分析揭示了显著的社会经济偏见,即预训练的dCNNs在分类准确性、分类信心和将标签分配给可能冒犯住宅的倾向方面都较低(例如,“脏乱”、“贫民窟”等),特别是在社会经济地位较低的住宅(SES)中。这一趋势在两个国际图像数据集和美国的多样经济和种族景观中都是一致的。这项研究为计算机视觉中的偏见提供了更深入的理解,强调了需要创建更包容和代表性训练数据集的必要性。通过减轻计算机视觉流程中的偏见,我们可以确保应用计算机视觉获得更公平和平等的结果,包括住宅估值和智能家居安全系统。解决这些偏见的问题具有紧迫性,这可能会对城市发展和资源分配产生重大影响。我们的研究还推动了开发更了解和服务于多样社区的AI系统,朝着实现让所有社会各阶层都受益的技术方向发展。
URL
https://arxiv.org/abs/2401.13097