Abstract
In the United States, primary open-angle glaucoma (POAG) is the leading cause of blindness, especially among African American and Hispanic individuals. Deep learning has been widely used to detect POAG using fundus images as its performance is comparable to or even surpasses diagnosis by clinicians. However, human bias in clinical diagnosis may be reflected and amplified in the widely-used deep learning models, thus impacting their performance. Biases may cause (1) underdiagnosis, increasing the risks of delayed or inadequate treatment, and (2) overdiagnosis, which may increase individuals' stress, fear, well-being, and unnecessary/costly treatment. In this study, we examined the underdiagnosis and overdiagnosis when applying deep learning in POAG detection based on the Ocular Hypertension Treatment Study (OHTS) from 22 centers across 16 states in the United States. Our results show that the widely-used deep learning model can underdiagnose or overdiagnose underserved populations. The most underdiagnosed group is female younger (< 60 yrs) group, and the most overdiagnosed group is Black older (>=60 yrs) group. Biased diagnosis through traditional deep learning methods may delay disease detection, treatment and create burdens among under-served populations, thereby, raising ethical concerns about using deep learning models in ophthalmology clinics.
Abstract (translated)
在美国, primary open-angle glaucoma (POAG)是导致失明的主要原因,特别是在非裔美国人和西班牙裔美国人中。深度学习已经被广泛应用于利用 fundus图像检测 POAG,因为其表现可以与甚至超过临床医生的诊断水平。然而,临床诊断中的人类偏见可能会反映和放大在广泛使用的深度学习模型中,从而影响其表现。偏见可能导致(1) under诊断,增加延迟或不足治疗的风险,(2) over诊断,增加个人的压力、恐惧、健康和不必要的/昂贵的治疗。在本研究中,我们研究了在基于美国16个州22个中心的Ocular Hypertension Treatment Study(OHTS)的 POAG检测中应用深度学习时 under诊断和 over诊断的情况。我们的结果显示,广泛使用的深度学习模型可能 under诊断或 over诊断未被满足的人群。最 under诊断 的群体是女性年龄小于60岁组,最 over诊断的群体是黑人年龄大于60岁组。传统的深度学习方法中的偏见可能导致疾病检测、治疗和在欠服务群体中造成负担,从而提出了在眼科诊所使用深度学习模型的伦理问题。
URL
https://arxiv.org/abs/2301.11315