Authors :
Presenting Author: Kevin Xie, M.S. – University of Pennsylvania
Ryan Gallagher, B.S. – University of Pennsyvania; William Ojemann, B.S. – University of Pennsylvania; Alfredo Lucas, M.S. – University of Pennsylvania; Elizabeth Sweeney, PhD – University of Pennsylvania; Dan Roth, PhD – University of Pennsylvania; Brian Litt, MD – University of Pennsylvania; Colin Ellis, MD – University of Pennsylvania
Rationale:
Health disparities are unfortunately ubiquitous in the healthcare system, including in the care of people with epilepsy. In the field of Natural Language Processing (NLP) there are also concerns that models perpetuate biases against historically-marginalized groups. We recently validated an NLP algorithm to extract epilepsy outcome measures from clinic note text. In this study, we sought to address two questions: (1) do our models detect health disparities in historically-marginalized groups, and (2) are our models biased against these groups?
Methods:
Our dataset consisted of 79,926 visits from 12,819 patients in our EHR, and included the raw text of their clinical notes, their demographic information, and their prescription records for Anti-Seizure Medications (ASMs). Our NLP pipeline extended pre-trained transformer models developed by Google AI and have been finetuned to extract seizure freedom from the raw text of clinical notes. To address question (1), we used logistic mixed regression models to understand how our outcome of interest – seizure freedom since last visit – was affected by the patient’s race, gender, income, insurance, and age, controlling for the interval of time since their last visit. We also included the number of prescribed (ASMs) as a variable of interest to act as a positive control. To address question (2), we investigated biases of our NLP models by evaluating their accuracy, and false negative rates on patients of each of groups.
Results:
In a series of univariate analyses, the likelihood of having a seizure since the last visit was lower in men vs. women (OR 0.74, p = 5x10-8); patients with private insurance vs. those with public insurance (OR 0.66, p = 4x10-14); patients from higher income zip codes (OR 0.29, p = 5x10-14); older patients (OR 0.27, p = 1x10-20); and patients on one ASM vs. those on none or at least two (OR ≤ 0.82, p < 5x10-3). White patients were significantly less likely to have had a seizure compared to non-white patients (OR 0.78, p = 1x10-5) in the univariate analyses only, indicating some overlap in effect between race and the other variables in the multivariate analysis. For question (2), we found that our pipeline had unbiased accuracies and false negative rates with regards to these variables (Fisher’s exact and Kolmogorov–Smirnov tests, all p > 0.05), indicating that our findings were not confounded by algorithmic failures.
Conclusions:
Using the NLP-derived outcomes, we found evidence of health disparities along several demographic axes: gender, income, insurance, and age all significantly contributed to seizure freedom. We did not find evidence of bias within our NLP algorithm that extracts seizure outcomes from clinical note text. Understanding the underlying factors that contribute to treatment disparities between sub-populations is critical to improving and ensuring equitable care for all patients with epilepsy.
Funding:
NINDS 1DP1OD029758, T32NS091006, K23NS121520; NIH R01NS125137; ONR N00014-19-1-2620; the Mirowski Family Foundation; contributions from Neil and Barbara Smit and Jonathan and Bonnie Rothberg; and the AAN Susan S. Spencer Clinical Research Training Scholarship.