Retrospective determination of best-case inter-rater agreement between electroencephalographers in a typical clinical workflow
Abstract number :
3.112
Submission category :
3. Clinical Neurophysiology
Year :
2011
Submission ID :
15178
Source :
www.aesnet.org
Presentation date :
12/2/2011 12:00:00 AM
Published date :
Oct 4, 2011, 07:57 AM
Authors :
A. Nizam, S. Chen, S. Wong
Rationale: Standardization of electroencephalography interpretation is a worthwhile goal, but practical determination of inter-rater agreement (IRA, typically measured by kappa scores) to identify areas in need of improvement is difficult. Calculation of IRA is labor-intensive, generally requiring individuals to rate the same test items under somewhat artificial conditions. We report here a novel method to assess best-case kappa scores that does not require readers to rate the same test items. This methodology minimizes duplication of work and is more suitable for ongoing quality assessment in a typical clinical workflow.Methods: Findings from adult routine EEG reports acquired over a 1-year period at our University hospital were imported into an SQL database. Sequential SQL queries were used to convert textual findings into a data matrix. EEG reader-dependent findings included the presence or absence of sleep, diffuse abnormalities, epileptiform abnormalities, or other abnormalities. Other variables included age, study location (outpatient, medical-surgical ward, and ICU), and reader (total of 4). In univariate analyses, Student s t-tests were used to compare ages across locations, and a z-test for proportions for number of readings across locations. Multivariate logistic regression was used to generate a linear model for each reader s tendency to read certain EEG patterns, after adjusting for age and location. Equations for a best-case Kappa score were derived from the resultant generalized linear model.Results: We analyzed 1952 EEG reports from a 1-year period at our University hospital. In univariate analyses, we found that age is negatively associated with sleep, and positively associated with epileptiform and diffuse abnormalities. Findings also varied across locations, with more frequent abnormalities and disrupted sleep architecture seen in more progressively acute locations. After adjusting for age, location, and multiple hypothesis testing, significant inter-reader differences were seen. The mean inter-reader best-case kappa scores was 0.49+-0.27, 0.83+-0.10, 0.69+-0.24, and 0.71+-0.16 for sleep, diffuse abnormalities, epileptiform abnormalities, and other abnormalities, respectively, with half of all scores reaching statistical significance. Within-reader (control) best-case kappa scores were 0.79+-0.16, 0.88+-0.05, 0.84+-0.21, and 0.72+-0.26, respectively, with no scores reaching statistical significance. Interestingly, best-case kappa scores could be calculated directly from model coefficients and were invariant to patient age and location.Conclusions: Best-kappa can be an effective tool for identifying differences between readers, enabling targeted quality control efforts to improve EEG report standardization. The critical underlying assumption that the underlying patient population exhibits similar findings between readers. Its main advantage is its applicability to retrospective report databases and real-world clinical workflows.
Neurophysiology