Abstracts

Consensus Training Significantly Improves Inter-rater Reliability in Categorizing Tms-induced Speech Errors

Abstract number : 2.043
Submission category : 3. Neurophysiology / 3E. Brain Stimulation
Year : 2022
Submission ID : 2205025
Source : www.aesnet.org
Presentation date : 12/4/2022 12:00:00 PM
Published date : Nov 22, 2022, 05:27 AM

Authors :
Shalini Narayana, PhD – University of Tennessee Health Science Center, Le Bonheur Children's Hospital, Memphis TN; Talitha Boardman - University of Tennessee Health Science Center; Fiona Baumer, MD – Assistant Professor, Department of Neurology, Stanford University School of Medicine, Palo Alto, CA; Pediatric Neurology, Lucile Packard Children’s Hospital, Palo Alto, CA; Clifford Calley, MD – University of Texas at Austin, Austin, TX; Dell Children’s Hospital, Austin, TX; Hansel Greiner, MD – Division of Neurology – University of Cincinnati College of Medicine, Cincinnati Children's Hospital, Cincinnati, OH; Anuj Jayakar, MD – Pediatric Neurology – Nicklaus Children’s Hospital, Miami, FL; Brian Lundstrom, MD, Ph.D. – Neurology – Mayo Clinic, Rochester, MN; Negar Noorizadeh, Ph.D. – Neuroscience Institute – Le Bonheur Children's Hospital, Memphis, TN; Donnie Starnes, MD – Neurology – Mayo Clinic, Rochester, MN; Phiroz Tarapore, MD, Ph.D. – Neurological Surgery – University of California, San Francisco, San Francisco, CA; Melissa Tsuboyama, MD – Neurology – Boston Children's Hospital, Harvard Medical School, Boston, MA

Rationale: Transcranial magnetic stimulation (TMS) is a non-invasive language mapping technique increasingly adopted by pediatric epilepsy centers. TMS transiently disrupts cortical function. Language cortex is identified when TMS induces aphasia, semantic or production errors while the patient names objects. Cortical regions that give rise to errors are deemed critical for language and thus influence surgical planning. Currently, there is no consensus to guide error classification and categorizing whether a speech error has occurred, and if so, which type, relies on the judgement of the individual TMS provider. Therefore, through a nationwide consortium of TMS providers, we aimed to establish consensus guidelines for classifying errors and explore to what extent training would improve the inter-rater reliability in categorizing TMS-induced speech errors.

Methods: We created a dataset consisting of 65 speech samples (55% left hemisphere trials) derived from 5 patients (13.4±5 y; 3 males) who underwent TMS language mapping studies. Each speech sample included video of the patient naming an object during a baseline (without TMS) and while TMS was applied. Providers from 8 TMS laboratories in the US independently categorized responses from this dataset as: no error, speech arrest, production error, semantic error, or muscle stimulation/pain. Fleiss’s kappa (k) was calculated between raters to establish the baseline inter-rater reliability (IRR) (overall and for each error type). All providers met to discuss criteria for categorizing each error type and reviewed a subset (20%) of samples that had < 80% agreement and reclassified them based on group consensus. Providers then independently reclassified all 24 samples with < 80% agreement to update the IRR.

Results: Prior to the consensus meeting, 63% of speech samples achieved ³80% agreement between raters (Figure 1) and overall, there was good IRR with a k = 0.595. Agreement between providers was highest for speech arrest errors while production errors and muscle stimulation/pain had low agreement (See Table 1 for details). Following the meeting, the 37% (n=24) of samples with < 80% agreement were again reviewed by each rater. Concordance between raters increased such that 80% of samples had at least 80% agreement (Figure 1) IRR significantly improved (paired t-test, P=0.02) across all domains (Table 1). Notably, Overall agreement between raters improved 13% and the agreement for categorizing production error and muscle stimulation/pain improved by more than 50%.

Conclusions: We demonstrate that inter-rater reliability of categorizing TMS-induced speech errors can be significantly improved by consensus discussion. The meeting between TMS providers across many centers facilitated defining criteria for classifying each error type and standardizing the error rating across institutions. A critical next step will be testing whether standardized TMS language mapping reporting will improve the accuracy of TMS-derived language maps and further facilitate its integration into surgical planning.

Funding: AES Infrastructure Grant
Neurophysiology