Inter-rater Reliability of EEG Interpretation; A Large Single-Center Study
Abstract number :
3.114
Submission category :
3. Clinical Neurophysiology
Year :
2011
Submission ID :
15180
Source :
www.aesnet.org
Presentation date :
12/2/2011 12:00:00 AM
Published date :
Oct 4, 2011, 07:57 AM
Authors :
G. Chari, S. G. Abdel Baki, V. Arnedo, E. Koziorynska, C. A. Lushbough, D. Maus, T. D. McSween, K. A. Mortati, A. Omurtag, A. Reznikov, J. Weedon, A. C. Grant
Rationale: Inter-rater reliability (IRR) of EEG interpretation has significant clinical implications, since there is no gold standard from which to know an EEG s true interpretation. Prior studies of EEG IRR have been limited by the number of interpreters, number of EEGs, heterogeneity of EEGs, and methods of categorizing EEG findings. The purpose of this study was to: (i) examine the IRR of EEG interpretation among 6 raters; (ii) require raters to choose one of seven electrographic categories for each EEG; and (iii) confirm IRR group agreement by assessing pair wise reliability among all possible pairs of raters.Methods: The sample consisted of 200 representative EEGs from patients ?1 year old. Six board-certified EEGers (A-F) were divided into 20 groups of 3 readers (ABC, ABD DEF). Each of the 20 groups interpreted 10 EEGs, so that each EEG was interpreted by 3 readers and each reader interpreted 100 studies. Readers knew patient age and medications, but were blinded to clinical history, indication for EEG, and each other s interpretations. Readers had to assign probabilities to one or more of 7 diagnostic categories (Table 1), and one category had to have a higher probability than all the others. Fleiss kappa (Kf) was used as a measure of IRR among all 6 readers, and Cohen s kappa coefficient (Kc) was used as a measure of IRR among rater pairs, and compared to the aggregated Kf.Results: Inter-rater agreement was moderate for EEG categories NL, Kf = 0.554; Epi, Kf = 0.495; and Slow, Kf = 0.453. By contrast, agreement was slight to fair for the following categories: Epi+Slow, Kf = 0.353; SE, Kf = 0.259; SZ, Kf = 0.320; and UI, Kf = 0.184. The aggregated agreement over all 7 categories was in the moderate range (Kf = 0.442); consistent with a Kc of 0.43 when rater pair agreement was summarized across all EEG classes.Conclusions: This study highlights the subjective nature of EEG interpretation, with no Kf scores exceeding 0.6. It also reveals variability of IRR as a function of EEG diagnostic category. The relatively low agreement for the categories of status epilepticus and seizure represents a serious limitation in the diagnostic reproducibility of ictal EEG features. Possible methods to improve IRR of EEG interpretation include establishing definitive consensus guidelines for certain EEG features, and periodic review of controversial EEGs among groups of EEGers with the goal of achieving a consensus interpretation.
Neurophysiology