top of page

Methods & Study Design

Machine Learning Process:

Machine Learning Process Flow Chart.png

Data Collection

  • Data set obtained from Kaggle.com

  • Contains 11,500 observations of epileptic and non-epileptic EEGs from 500 patients

  • Response variable: Seizure activity (1-5), Explanatroy variables: EEG reading at successive times (178 total)

Data Preparation

  • Transformed to binary response variable (1: seizure, 0: no seizure)

  • Used randomization to split data into Training (80%) and Test (20%) sets

Model Training

  • 3 different classification models were developed using the training data set: logistic regression, Support Vector Machines (SVM),  and Long Short-Term Memory (LSTM)

  • Cross-validation was used to improve performance and prevent over-fitting

Model Evaluation

  • Model results were compared using the test data set

  • Confusion matrices and Reciever Operating Characteristic Curves (ROC) were used to assess model sensitivity and false positive rate

Raw Data.png
  • •Each observation of this data was a total of 178 datapoints over the 23.6 sec interval (explanatory variables)

  • •The y column (response variable) classified our data as 1-5 which can be transformed to a binary response of 1 or 0, where 1 is seizure and 0 is no seizure

  • •Each individual data point is a selected recording of a section of a patient’s brain at a given interval of time.

EEG Signal 1-5.png
  • 23 segments of EEG data for 500 patients were used in this data set

  • Each 23.6 sec segment contains 178 readings (0.133 sec interval)

  • Each segment was classified as 1-5, where 1 contained seizure activity, 2 and 3 were seizure free intervals, and 4 and 5 were healthy volunteers

  • All EEG signals were recorded with the same 128- channel amplifier system and written continuously onto the disk of a data acquisition computer system

Our Classification Models

Logistic Regression

  • Traditional classification method for binary response variables

  • Data is fit using a sigmoid function

  • Decision boundary selected to classify the predicted probability value

  • Simple method that generates easily interpretable results

  • Does not typically perform well with complex data   

Logistic Regression.jpg

Support Vector Machines (SVM)

  • Powerful but flexible method for supervised learning classification

  • Goal is to divide the data classes and find the maximum margin (the gap between the closest data points of different classes)

  • SVM typically offers accurate results and works well with high-dimensional data

  • May require longer training times

Support Vector Machines.png

Long Short-Term Memory (LSTM)

  • Complex method based on Recurrent Neural Networks (RNN)

  • Unlike traditional neural network methods, RNNs the outputs depend on prior sequence elements (memory)

  • LSTM expands the memory of RNNs using memory blocks

  • LSTM is well suited for complex, time-series data

SVM Model Picture.png
bottom of page