Methods & Study Design
Machine Learning Process:
Data Collection
-
Data set obtained from Kaggle.com
-
Contains 11,500 observations of epileptic and non-epileptic EEGs from 500 patients
-
Response variable: Seizure activity (1-5), Explanatroy variables: EEG reading at successive times (178 total)
Data Preparation
-
Transformed to binary response variable (1: seizure, 0: no seizure)
-
Used randomization to split data into Training (80%) and Test (20%) sets
Model Training
-
3 different classification models were developed using the training data set: logistic regression, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM)
-
Cross-validation was used to improve performance and prevent over-fitting
Model Evaluation
-
Model results were compared using the test data set
-
Confusion matrices and Reciever Operating Characteristic Curves (ROC) were used to assess model sensitivity and false positive rate
-
•Each observation of this data was a total of 178 datapoints over the 23.6 sec interval (explanatory variables)
-
•The y column (response variable) classified our data as 1-5 which can be transformed to a binary response of 1 or 0, where 1 is seizure and 0 is no seizure
-
•Each individual data point is a selected recording of a section of a patient’s brain at a given interval of time.
-
23 segments of EEG data for 500 patients were used in this data set
-
Each 23.6 sec segment contains 178 readings (0.133 sec interval)
-
Each segment was classified as 1-5, where 1 contained seizure activity, 2 and 3 were seizure free intervals, and 4 and 5 were healthy volunteers
-
All EEG signals were recorded with the same 128- channel amplifier system and written continuously onto the disk of a data acquisition computer system
Our Classification Models
Logistic Regression
-
Traditional classification method for binary response variables
-
Data is fit using a sigmoid function
-
Decision boundary selected to classify the predicted probability value
-
Simple method that generates easily interpretable results
-
Does not typically perform well with complex data
Support Vector Machines (SVM)
-
Powerful but flexible method for supervised learning classification
-
Goal is to divide the data classes and find the maximum margin (the gap between the closest data points of different classes)
-
SVM typically offers accurate results and works well with high-dimensional data
-
May require longer training times
Long Short-Term Memory (LSTM)
-
Complex method based on Recurrent Neural Networks (RNN)
-
Unlike traditional neural network methods, RNNs the outputs depend on prior sequence elements (memory)
-
LSTM expands the memory of RNNs using memory blocks
-
LSTM is well suited for complex, time-series data