Abstracts

Randomized Data Splitting Leads to Inflated Performance for Seizure Forecasting Algorithms

Abstract number : 3.111
Submission category : 2. Translational Research / 2D. Models
Year : 2023
Submission ID : 984
Source : www.aesnet.org
Presentation date : 12/4/2023 12:00:00 AM
Published date :

Authors :
Presenting Author: Camden Shultz, BS – The Johns Hopkins University

Trevor Meyer, Graduate Student – Electrical and Computer Engineering – The Johns Hopkins University; Pedro Irazoqui, PhD – Electrical and Computer Engineering – The Johns Hopkins University

Rationale:

Seizure forecasting algorithms could play a vital role in managing epilepsy by providing timely warnings and enabling proactive interventions. However, algorithms have historically struggled when translated to unseen, real-time data [1]. Traditionally, machine learning (ML) algorithms assume data points are independent, naively dividing into training, validation, and testing sets randomly across time. We explore different data splitting strategies and show the most common randomized strategy leads to inflated performance reporting. Thus, a more representative data split leads to significantly lower accuracies.



Methods:

We use the publicly available CHB-MIT EEG dataset [2] and extract 5-sec segments from either four hours before/after the start/end of a seizure (baseline) or within one hour of seizure onset (preictal), under sampling the majority class to balance the dataset. Segments are organized to construct the training, validation, and testing sets using one of three time-aware methods: Random Split (RS), Divided Split (DS), or Temporal Split (TS) (Figure 1). We compare these by evaluating the performance of ML models from literature, including: a Convolutional Neural Network (CNN); a CNN plus a Long Short-Term Memory Network (CNN-LSTM) [3]; and a Time Scale Network (TiSc Net) [4]. Due to the patient specific nature of epilepsy, we train a new network for each patient using 10-fold cross validation on 70% of the data for parameter tuning and holding 30% of the data to for final performance metrics.



Results:

RS outperforms DS and TS, leading to over-optimistic accuracies. As shown in Figure 2, we achieve 90.9%, 88.5%, and 90.2% accuracy for CNN, CNN-LSTM, and TiSc Net respectively with RS. In comparison, DS leads to an average accuracy deflation of 11.7% and 12.4%, and TS leads to an even greater deflation of 27% and 32.5% for CNN and CNN-LSTM models respectively. TiSc Net only suffers a 7.3% and 15.9% loss in accuracy, implying this architecture may be more robust.



Conclusions:

DS and TS consistently yield lower accuracies when compared to RS, which may explain the disparity between the glowing results of offline algorithms and poor performance seen in real-time. Use of DS or TS could help researchers identify algorithms which translate to unseen data and increase efficacy in real-world applications.

Figure 1. Illustration of data splitting strategies. RS results in proportional randomly sampled segments for training and testing sets. DS creates groups by dividing each continuous class chronologically, whereas TS chronologically divides across the entire sequence. Validation set not shown for brevity.

Figure 2. (A) Model accuracy deflation (across all subjects and models) due to TS and DS compared to RS. (B) Accuracy distributions grouped by model and data split.

[1] D Freestone et al, Current opinion in neurology, 2017

[2] A Shoeb et al, PhD Thesis at MIT, 2009

[3] H Daoud et al, Tran. On Biomed. Circuits and Systems, 2019

[4] T Meyer et al, Jnl of Select Topics in Sig Proc, 2023 (Under Review)



Funding: Funding: NIH NS119390

Translational Research