Authors :
Presenting Author: Doyle Yuan, MD – University of Texas Southwestern Medical Center
Roohi Katyal, MD – Louisiana State University Health Sciences Center
Irfan Sheikh, MD – University of Texas Southwestern Medical Center at Dallas
Peter Kaplan, MD – Johns Hopkins
Neel Fotedar, MD – Epilepsy Center, Neurological Institute, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
Jordan Clay, MD – University of Kentucky
Selim Benbadis, MD – University of South Florida
Sindhu Richards, MD – University of Utah
Donald Schomer, MD – Beth Israel Deaconess Medical Center
Kollencheri Puthenveettil Vinayan, MD – Amrita Institute of Medical Sciences
Vitor Pacheco, MD – Baylor College of Medicine
Stephan Schuele, MD, MPH – Northwestern University
John McLaren, MD – Boston Children's Hospital
Katia Lin, MD, PhD – Universidade Federal de Santa Catarina - UFSC
James Fessler, MD – Washington University in St. Louis
Ji Yeoun Yoo, MD – Icahn School of Medicine at Mount Sinai, New York City
Adam Greenblatt, MD – Washington University School of Medicine
Sándor Beniczky, MD, PhD – Aarhus University Hospital, Member of European Reference Network EpiCARE and Danish Epilepsy Centre
M. Brandon Westover, MD, PhD – Beth Israel Deaconess Medical Center
Fábio Nascimento, MD – Washington University School of Medicine
Rationale: The value of EEG relies on accurate and reliable interpretation, which, in turn, depends greatly on the reader. Human judgment is characteristically non-deterministic, varying from one person or one occasion to the next. Prior research has demonstrated that inter-rater reliability is imperfect. Intra-rater variability, however, remains poorly characterized. We performed a “noise audit” to measure intra-rater variability in EEG interpretation by epileptologists.
Methods: Experts independently rated 100 full-length EEGs twice, on two separate occasions (Parts I and II) ≥60 days apart. Experts were unaware that the EEGs were the same in both Parts. EEGs were acquired from adult (56%) and pediatric patients in the outpatient (78%) and non-critical care inpatient settings. EEGs were presented using our online platform (“EEGHub”) with no clinical information other than age and sex. The 100-EEG dataset included 20 studies in each of the 5 categories: normal, focal epileptiform (FE), generalized epileptiform (GE), focal non-epileptiform (FNE), and generalized non-epileptiform (GNE) abnormalities. For each recording, experts were asked to indicate the presence and type of abnormalities and to rate their level of confidence in their answer. Experts also rated their overall confidence in identifying each of the four abnormal categories. We computed percent self-agreement and utilized Spearman’s correlation and Fisher’s exact test to examine relationships between variables.
Results:
Thirteen experts (mean of 19 years reading EEG) completed both Parts. Percent self-agreement was 96% for GE, 90% for FE, 88% for FNE, and 87% for GNE abnormalities. Agreement was 90% for any epileptiform abnormality and 82% for any non-epileptiform abnormality. There was a statistically significant negative correlation between self-agreement on the presence of FE abnormalities and confidence in identifying said abnormalities. High-confidence responses were significantly associated with greater rates of self-agreement on the presence of non-epileptiform abnormalities compared to low-confidence responses. A similar association was not seen regarding the presence of epileptiform abnormalities.
Conclusions: We found robust levels of within-expert agreement in EEG interpretation. Greater reliability was observed in the identification of epileptiform abnormalities, especially generalized ones, compared to non-epileptiform abnormalities. Nevertheless, the number of flipped decisions on Part II was not negligible and is likely clinically significant, especially concerning epileptiform abnormalities. Confidence ratings suggest that raters were aware of their uncertainties regarding non-epileptiform findings. In contrast, raters who felt more confident in recognizing FE abnormalities paradoxically seemed to agree with themselves less on these identifications. Our study provides valuable information for guiding the development of interventions to improve reliability and, consequently, the accuracy of EEG interpretations.
Funding: None