Methods & Study Design
Machine Learning Process:
Data Collection
Data set obtained from
Contains 11,500 observations of epileptic and non-epileptic EEGs from 500 patients
Response variable: Seizure activity (1-5), Explanatroy variables: EEG reading at successive times (178 total)
Data Preparation
Transformed to binary response variable (1: seizure, 0: no seizure)
Used randomization to split data into Training (80%) and Test (20%) sets
Model Training
3 different classification models were developed using the training data set: logistic regression, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM)
Cross-validation was used to improve performance and prevent over-fitting
Model Evaluation
Model results were compared using the test data set
Confusion matrices and Reciever Operating Characteristic Curves (ROC) were used to assess model sensitivity and false positive rate
•Each observation of this data was a total of 178 datapoints over the 23.6 sec interval (explanatory variables)
•The y column (response variable) classified our data as 1-5 which can be transformed to a binary response of 1 or 0, where 1 is seizure and 0 is no seizure
•Each individual data point is a selected recording of a section of a patient’s brain at a given interval of time.
23 segments of EEG data for 500 patients were used in this data set
Each 23.6 sec segment contains 178 readings (0.133 sec interval)
Each segment was classified as 1-5, where 1 contained seizure activity, 2 and 3 were seizure free intervals, and 4 and 5 were healthy volunteers
All EEG signals were recorded with the same 128- channel amplifier system and written continuously onto the disk of a data acquisition computer system
Our Classification Models
Logistic Regression
Traditional classification method for binary response variables
Data is fit using a sigmoid function
Decision boundary selected to classify the predicted probability value
Simple method that generates easily interpretable results
Does not typically perform well with complex data
Support Vector Machines (SVM)
Powerful but flexible method for supervised learning classification
Goal is to divide the data classes and find the maximum margin (the gap between the closest data points of different classes)
SVM typically offers accurate results and works well with high-dimensional data
May require longer training times
Long Short-Term Memory (LSTM)
Complex method based on Recurrent Neural Networks (RNN)
Unlike traditional neural network methods, RNNs the outputs depend on prior sequence elements (memory)
LSTM expands the memory of RNNs using memory blocks
LSTM is well suited for complex, time-series data