The troubling ability of Machine Learning to detect a patient’s race can tangibly influence patient management

Author(s):
Addison Heffernan; Isaac Sears; Daithi Heffernan

Background:

Racial bias is known to affect healthcare outcomes. It is believed that machine learning models (MLMs), based on pure mathematic algorithms, should be agnostic to race. However, previously we demonstrated that ML models are affected when the race of a patient is including in the dataset. Given that many socioeconomic factors and the presence or absence of medical conditions are associated with race, we hypothesize that ML models can indeed predict the race of a patient, and thereby affect therapeutic strategies based on this mathematically predicted racial profile.

Hypothesis:

Methods:

5 years of NSQIP data was imported into Python, pre-processed and split into 80 % training and 20% testing to generate MLMs. The testing set, which excluded race as a category, was tested for the ability to predict the race of the patient. We tested hierarchy of validity of four MLMs (XGBoost(XGB), K-Nearest Neighbor(KNN), Random Forest(RanFor) and Logistic Regression(LR)) with reported ROC Area Under Curves(AUC). Finally adjusted modeling was used to determine whether the ML models suggested operative versus non-operative management based on the predicted race of the patient.

Results:

Overall, 3,416,094 NSQIP patients were included, with overall average age of 57.4 yrs, 57% were male, and 14% non-elective. Within the datasets 47.7% were Caucasian, 8.8% were Black, and 10.6% were Hispanic, and 4.1% were Asian. Despite race being excluded, MLMS were able to predict the race of the patient with an overall probability of approximately 75%. The ability to predict race was highest among emergency cases. Further, a hierarchy among ML models was evident wherein XGB and RanFor had superior predictability (AUC 0.84) compared with KNN (AUC=0.56). The factors that most influenced the ability of the ML model to predict the race of the patient included chronic renal disease, albumin level, age, diabetes, BMI and socioeconomic factors including insurance status. Finally, when this race predicting modeling was applied to management algorithms (operative versus non-operative), ML models displayed bias similar to that seen when the race of the patient was known within the Machine Learning model.

Conclusions:

It is troubling that despite removing the category of race from a dataset, Machine Learning models can still predict the race of a patient. Moreover, despite the impersonal mathematical basis of ML models, suggested patient care is affected, noting potential patterns of structural racism within surgical datasets.