Interrater Agreement for Spike Detection in Routine EEG: Spike Scoring Competition Results
Abstract number :
1.044
Submission category :
3. Neurophysiology
Year :
2015
Submission ID :
2325116
Source :
www.aesnet.org
Presentation date :
12/5/2015 12:00:00 AM
Published date :
Nov 13, 2015, 12:43 PM
Authors :
Amir Arain, Giridhar Kalamangalam, Suzette laRoche, Leonardo Bonilha, Maysaa Basha, Nabil Azar, Ekrem Kutluay, Gabriel Martz, Chad Waters, Brian Dean, Jonathan Halford
Rationale: Reliable computerized detection of epileptiform transients (ETs), characterized by interictal spikes and sharp waves in the EEG, is a useful goal since this would assist neurologists in interpreting EEGs. To demonstrate the need for such automated systems, we collected inter-rater scoring data from a group of academic clinical neurophysiologists (ACNs) by conducting a Spike Scoring Competition.Methods: Two hundred 30-second routine scalp EEG segments from 200 different patients were included in the study. EEG segments were divided as follows: 100 EEG segments containing subtle ETs or benign paroxysmal activity such as exaggerated alpha activity, wicket spikes, and small sharp spikes that may be misinterpreted as abnormal; 50 EEG segments retrieved from consecutive normal EEG recordings; and 50 EEG segments retrieved from consecutive abnormal EEGs interpreted as containing ETs. Level of fellowship training, years of practice, and board certification status were collected for all participants. EEG scoring was performed using EEGnet, a distributed web-based platform for the analysis of scalp EEG recordings. Scoring was performed using a novel three-phase method. In Phase 1, 18 ACN scorers marked the location of all ETs on any single EEG channel. In Phase 2, ETs which were marked by at least 2 scorers in Phase 1 were presented to all 18 scorers who annotated how epileptiform each event was on a five-point “ET scale” (5 – very epileptiform, 4 – probably epileptiform, 3 – uncertain, 2 – probably not epileptiform, 1 – definitely not epileptiform). In Phase 3, any event for which there was a contradiction between an individual scorer’s labeling between Phase 1 and Phase 2 was re-presented to each scorer to get a third (tie break) opinion. For inter-rater analysis, we examined average pair-wise Cohen’s kappa correlation (CKC) as well as Fleiss kappa correlation (FKC).Results: Overall FKC for Phase 2-3 scoring for all scorers was low at 0.14 if all 5 points on the ET scale were used and low-moderate at 0.34 if the ET scale score was grouped into 2 categories: ET (4-5 on ET scale) versus non-ET (1-3 on ET scale). The top 7 scorers (“winners” of the Competition), based on having the highest 7 average pair-wise CKCs versus all other scorers, had a moderate inter-rater FKC among themselves (for 2 category analysis) of 0.52. The bottom 11 scorers had a poor FKC among themselves (for 2 category analysis) of 0.24. Average pair-wise CKC for all scorers versus the top 7 scorers (for 2 category analysis) ranged from 0.20 to 0.74. Participants who were board certified by the American Board of Clinical Neurophysiology (ABCN) performed better than those who were not (p = 0.05). Performance did not correlate with duration of fellowship training or years in practice.Conclusions: Inter-rater agreement for labeling ETs shows substantial variability among ACNs and is overall low. A cohort of the ACNs had a moderate inter-rater agreement. ACNs with board certification by the ABCN performed better than those who were not.
Neurophysiology