ISSN: 0970-938X (Print) | 0976-1683 (Electronic)

Biomedical Research

An International Journal of Medical Sciences

Research Article - Biomedical Research (2018) Artificial Intelligent Techniques for Bio Medical Signal Processing: Edition-II

Investigation of a wavelet-based neural network learning algorithm applied to P300 based brain-computer interface

Enas Abdulhay*, Rami Oweis, Areej Mohammad and Lujain Ahmad

Biomedical Engineering Department, Faculty of Engineering, Jordan University of Science and Technology, Irbid, Jordan

*Corresponding Author:
Enas Abdulhay
Biomedical Engineering Department
Faculty of Engineering
Jordan University of Science and Technology, Jordan

Accepted on May 31, 2017

DOI: 10.4066/biomedicalresearch.29-17-432

Visit for more related articles at Biomedical Research

Abstract

The paper presented herein proposes an algorithm that aims at improving the classification accuracy of Brain-Computer Interface (BCI) speller. In this work, feed-forward neural network with back propagation learning is used for classification purposes. Testing of the proposed algorithm was performed through the utilization of two datasets, namely; Berlin BCI Competition III and EPFL BCI groups. Results, for the first dataset, indicated that the use of 64 electrodes with 30 hidden layers grants an accuracy of 94.9 %, while an average accuracy of 95.8% (range: 92%-100%) was obtained for the second dataset when using a 32 electrode configuration with 20 hidden layers. The obtained accuracy levels, in this study, are higher when compared with other recent classification approaches.

Keywords

P300, EEG, Classification, Neural network, Wavelet, Layer.

Introduction

People with motor disabilities and neuromuscular disorders such as spinal cord injuries, Amyotrophic Lateral Sclerosis (ALS), and those with "locked in" syndrome are limited in their ability of interaction with the surrounding world. Their motor neurons degenerate; they can no longer send impulses to the muscle fibers that normally result in muscle movement. This in turn results in muscles atrophy; limbs begin to look "thinner" and those most severely affected may lose all the voluntary movements [1]. Therefore, recent advancement in Brain Computer Interface (BCI) encouraged researchers to develop new non-muscular communication channels that allow people with motor disabilities to interact with the environment by effectively controlling communication facilities such as computers and speech synthesizers which would consequently improve their quality of life [2].

Target signals issued from the patient should be translated into commands [3]. The signals can be acquired by collecting brain signals (EEG) [4]. The obtained brain waveforms contain the information needed from the user intentions. EEG measures the brain electrical activity caused by the flow of electric currents during synaptic excitations of the dendrites in the neurons. The EEG signal is measured as the potential difference over time between active and reference electrodes. An extra third electrode, known as the ground electrode, is used to measure the differential voltage between the active and the reference points. The electrodes placed over the scalp are commonly based on the International 10-20 system; it is the most common method for brain signals detection because of its high temporal resolution, relative low cost, high portability and few risks to the users [1].

Since brain activity voltage measured by a given electrode is a relative measure, the measurement may be compared to another reference brain voltage situated on another site. This result in a combination of voltages: brain activity and noise. Because of this, the reference site should be chosen in a site where brain activity is almost zero. In general, there are three referencing methods, Common reference method, Average reference and Current source density (CSD) [5].

There are currently several major categories of BCIs in use that are classified based on the type of neurophysiologic signal they utilize. These categories include, but are not limited to, Visual Evoked Potentials (VEPs), P300 elicitation, alpha and beta rhythm activity, slow cortical potentials (SCPs), and microelectrode cortical neuronal recordings [3]. P300 waves are evoked potentials that are elicited in response to specific stimuli, while SCPs occupy the lowest frequency range of the EEG signal and are associated with cortical activation and deactivation [3].

The BCI system aims specifically at detecting the P300 signal and interpreting it in order to get the user intent. The P300 is a positive deflection in the human EEG, appearing approximately 300 ms after the presentation of rare or surprising, task-relevant stimuli. In P300 based BCI, a matrix of possible successively flashing choices is presented on a screen; and scalp EEG is recorded over the centroparietal area. Main advantage of P300 is the high number of choices. However, only the choice desired by the user evokes a corresponding large P300 potential (i.e. a high amplitude positive potential about 300 ms after the flash). BCI is a pattern recognition system that classifies each pattern into a class according to its extracted features. Feature selection is used to identify discriminative information from the brain signals and map them to feature vectors. Main feature selection and extraction methods to obtain relevant characteristics of signals are listed in [1].

Classification is the most important and challenging step, after feature selection and extraction, which aims at recognizing the user's intention based on the provided vectors as illustrated [1]. The design of the classification step involves one or a combination of algorithms. The algorithms should necessarily reduce the dimensionality and the bias-variance tradeoff [6-8].

The proposed work presents an algorithm that aims at improving the accuracy of BCI speller. The approach is to utilize the feed forward neural network classifier in combination with feature extraction methods, for two datasets: Berlin BCI Competition III [9] and EPFL BCI group [10]. The ability to learn from examples is one of most important properties of neural networks. Once trained, they are capable of recognizing a set of training data-related patterns [11,12]. A comparison with accuracy levels achieved by recent classification approaches will be presented in the discussion section.

Materials and Methods

Dataset 1

Dataset and paradigm: The proposed work has been applied to the Dataset II from BCI Competition III Challenge 2004 provided by Wadsworth Center (Subject A) [9]. The subject had to go through five sessions where each session consisted of multiple of runs. The task asked from the subject in each run is to focus his attention on different prescribed characters from a paradigm. P300 evoked potentials appear in the EEG in response to the intensification of a row or column containing the desired character. Each row and column in the matrix was randomly intensified for 100 ms resulting in 12 different stimuli-6 rows and 6 columns. After intensification of a row/ column, the matrix was blank for 75 ms. Row/column intensifications were block randomized in blocks of 12. The sets of 12 intensifications were repeated 15 times for each character and thus there were 180 total intensifications for each character epoch.

For a given acquisition session, all EEG signals issued via a 64 channel scalp acquisition system have been continuously collected. Before digitization at a sample rate of 240 Hz, signals have been bandpass-filtered (0.1-60 Hz) and decimated by a factor of 2. A more detailed description of the dataset can be found in the BCI competition paper [9].

The paradigm used for recording the data was described by Donchin et al., and originally by Farwell and Donchin, 1988 [13]. It consists of a 6 × 6 matrix of characters. All rows and columns of this matrix were successively and randomly intensified at a rate of 5.7 Hz.

Single trials extraction: As described in the data description, the providers have merged all the data (runs) in one signal (session). Therefore, trails extraction stage is required. This can be achieved by extracting all data samples between 0 to 667 ms posterior to the beginning of each of the 180 postintensifications.

Filtering and decimation: Filtering is a crucial step in noise reduction since certain types of noises and artifacts occur at known frequencies. After signal extraction, each signal has been filtered with an 8-order band-pass Chebyshev Type I filter [14], of which cut-off frequencies are 0.1 and 20 Hz [15], and has been decimated according to the high cut-off frequency.

Feature extraction (Discrete wavelet transform): Since the ERP (Event Related Potential) is a transient signal, timefrequency features are more appropriate. Time-frequency features can be obtained by the wavelet transform, which is an efficient tool for multi-resolution analysis of non-stationary and transient signals. Furthermore, Wavelet transform is potentially one of the most powerful signal processing techniques because of its ability to adjust to signal components.

The algorithm decomposes the original signal into two parts that are called appropriate coefficients and detail coefficients respectively. The discrete DWT coefficients are obtained by convolving x (n) with dilated/compressed and shifted versions of a wavelet function Ψj, k (n) that presents a specific oscillation model. The selection of a mother wavelet and a proper decomposition level are very important in the DWT [16]. The Daubechies family of wavelets have been chosen because they are orthogonal and easy to implement [17]. The approximate coefficients of the decomposition of level 5 are used as the features.

Normalization: Normalization is performed to allow comparison of different signals and to make the data more informative to users. It is performed by dividing the signals by a reference so that the factors affecting the signal and reference value are the same, which helps obtaining valid measurements compared to the reference value. Normalization was applied to the system using data scaling from 0 to 1 by calculating the maximum and minimum values of the data and hence the dividing factor.

Channel selection: Since it is possible to reduce the number of electrodes necessary for the classification of brain signals without losing substantial classification performance, we have tested the influence of number of electrodes used on the classification accuracy. The authors in [18] demonstrate that electrodes Fz, Cz, Pz, Oz, C3, C4, P3, P4, PO7 and PO8 have much more influence in P300 detection, while in [19], the best accuracy was achieved using data from electrodes Po7, Pz,Cpz,P7, Fc1,Cz, Po8 and Fc5.

Classification (Neural network): Neural network with back propagation learning algorithm was used for classification. Neural network is one of the new methods utilized in recent studies. NNs methods imitate the structure of biological neural network [20].

Using neural network pattern recognition toolbox in MATLAB R2009a, the feature matrix was used as the input while the target matrix as the desired output. The target matrix contained the labels (1, 0); where 1 is for the target and 0 for non-target. The Data was split into training, validation, and testing according to the percentages 70%, 15%, and 15% respectively. Training subset was used to train the model; validation subset was used to evaluate the model; and then testing the model has been achieved using the testing subset.

The input and target matrices are first fed into the tool box. The number of hidden neurons is then set. 15, 20, 30 and 40 hidden layers configurations have been tested in order to choose the optimized number. 1024 represents the total data and 1 is the number of layers resulted. The training is then initiated to classify the input according to the target matrix.

Dataset 2

Data description: The utilized EEG data is derived from EPFL BCI group [10]. The data set was sampled at 2048 Hz and acquired from 34 electrodes placed at the standard positions of the 10-20 international system. The system was tested on 8 subjects (4 disabled and 4 healthy).

Six images of a television, a telephone, a lamp, a door, a window, and a radio were displayed in front of the users on a laptop screen. Each subject completed four recording sessions. Each of the sessions consisted of six runs, one run for each of the six images. The images were flashed in random sequences, one image at a time. Each flash of an image lasted for 100 ms and during the following 300 ms none of the images was flashed, i.e. the inter stimulus interval was 400 ms. The duration of one run was approximately one minute and the duration of one session including setup of electrodes and short breaks between runs was approximately 30 min [3].

Single trial extraction: The data from each session was provided as a matrix that contains all the runs. The dimension of the matrix is hence (34 × the number of samples). Each of the 34 rows corresponds to one electrode, and each column corresponds to one temporal sample. Signal portions of 1000 ms were extracted from the data. Single trials started therefore at stimulus onset, i.e. at the beginning of the intensification of an image, and ended 1000 ms after stimulus onset. The fact that P300 Evoked Related Potential (ERP) appears about 300 ms after the stimulus makes this window large enough to capture the required time features for an efficient classification [3].

Filtering: It is a very important step to reduce the effect of artifacts and noise before starting any processing. A 6th order forward -backward Butterworth band pass filter, can eliminate some of the artifacts of known frequencies; the cut-off frequencies were 1 and 12 Hz [3]. Butterworth filter approximates the ideal filter well in the pass band, has an essentially flat amplitude-frequency response up to the cut-off frequency, achieves the sharpest attenuation, has a non-linear phase-shift and a monotonic drop in gain with frequency in the cut-off region with a maximally flat response below cut-off frequency [20].

Windsorizing: The effect of eye movement and muscle activity could cause large amplitude outliers in the EEG. To reduce this effect, the data from each electrode is windsorized. The 10th percentile of the samples from each electrode was computed and considered as the lower limit, while the 90th percentile was computed and considered as the higher limit [3].

Normalization: Normalization aims at allowing comparison of different signals and makes the data more informative to users. It is performed by dividing the signals by a reference so that the factors affecting the signal and reference value are the same. Normalization was applied on the system using MATLAB. The data has been scaled from 0 to 1 by zero mean and unity standard deviation [3].

Channel selection: The data provided was acquired based on 34 electrodes. In this work, four electrode configurations were tested in order to reduce the number of electrodes. The electrode configurations tested were 4, 8, 16, and 32 electrodes.

Feature vector construction: After pre-processing, the samples from the selected electrodes were considered as feature vectors with a dimension of (Nx × Ny) where Nx denotes the number of electrodes and Ny denotes the number of samples. The target vector has a size of (Ny × 1) and carries 1 or -1 [3].

Classification (Neural network classification): Neural network with back propagation learning algorithm was used for classification with the help of neural network pattern recognition toolbox in MATLAB R2009a. The feature matrix was used as input. The target matrix contained the labels (1, 0); where 1 is for the target and 0 for non-target.

The data was split into training, validation, and testing according to the percentages 70%, 15%, and 15% respectively. Training subset was used to train the model; validation subset was used to evaluate the model and then testing the model has been accomplished with the testing subset. Sessions 1-3 were used for training and session 4 for testing.

The input and target matrices were first fed into the Neural Network. The number of hidden neurons is then set. 15, 20 and 30 hidden layer configurations have been tested to optimize the related parameter. The training is then initiated to classify the input according to the target matrix. The number of input data is 1024, 1 is the number of output layers and 1 is the number of output data.

Results and Discussion

Dataset 1

Since P300 appears about 300 ms after the stimulus the chosen window is considered to be large enough to capture the required time features for an efficient classification [21]. After the extraction of the samples in the beginning of each intensification, signals from the 64 channels have been bandpass filtered from 0.1 Hz to 20 Hz. The range is chosen as the cognitive activity rarely occurs outside of the range 3-40 Hz. The Chebyshev filters have the property of error minimization between the idealized and the actual filter characteristic over the range of the filter. When compared to some filters, a Chebyshev filter can achieve a sharper transition between the pass band and the stop band with a lower order filter. The sharp transition between the pass band and the stop band of a Chebyshev filter produces smaller absolute errors and faster execution speeds [14].

db4 (mother wavelet) has been used to extract the features of EEG signals. The approximate coefficients of the decomposition of level 5 are used as the features. Using level 3 showed no change of results [22].

Before classification, a normalization stage was necessary since each character in the sequence was a separate data segment containing 15 repetitions of the flashing of all 12 rows/columns, and some individual data segments had much larger amplitudes. Each data segment was therefore, normalized independently, and each channel was also treated independently. Finally, the normalized features from all channels were fed to a neural network model designed using the MATLAB software.

The confusion matrices of the three configurations yielding the highest accuracies are (64-electrodes, 30 hidden layers), (10- electrodes, 15 hidden layers), and (8-electrodes, 40 hidden layers). Compared to literature, the 64 electrode configuration and 30 hidden layers, by achieving the accuracy of 94.9% outperformed the recent works in [22,23] applied to the first dataset. Table 1 illustrates the compared values.

Approach Accuracy
Dataset 1
[22] 85%
[23] 84%
Suggested work 94.90%
Dataset 2
[3] approach 1 92.50%
[3] approach 2 91.90%
[10] 99.50%
[24] 84%
[25] 84%
Suggested work 95.80%

Table 1. Comparison of the suggested work accuracy with published data.

The system validity was also evaluated based on sensitivity and specificity where True Positive (TP) defines the number of signals classified correctly for a specific character, False Positive (FP) defines the number of signals classified wrongly for the same character. True Negative (TN) defines the number of signals classified correctly as a different character, and False Negative (FN) defines the number of signals classified wrongly as a different character.

Specificity (SPC) /true negative rate:

Spc=TN/N=TN/FP+TN=97.3% → (1)

Sensitivity or true positive rate (TPR):

TPR=TP/P=TP/TP+FN=83.1% → (2)

Dataset 2

After extracting the samples from the original data and applying feature extraction and channel selection, the resultant feature vectors were fed to neural network.

The use of a 32 electrode configuration with 20 hidden layers outperformed the other configurations with an average accuracy of 95.8% and a maximum of 100%. A strong increase in classification accuracy has been observed between the electrode configurations consisting of four and eight electrodes. Using more than eight electrodes yielded relatively less improvement in performance.

The obtained average sensitivity and specificity values were 89.0% and 97.8%, respectively and the maximum values were 100% and 100%, respectively. The proposed method hence outperformed the methods SWDA [24], PB [3], Fuzzy logic [25], SVM and LDA [3] applied to the second dataset. NN approach gave the same accuracy and specificity ranges as BLDA [10] with a slightly lower sensitivity. This might be due to the high number of entry data to the neural network. Table 1 illustrates the compared values.

Conclusion

Recent researches have focused on improving the quality of life for people with motor disabilities such as building communication channels system that allow them to interact with the surrounding. This paper aimed at finding out how applying artificial neural network classifier coupled to appropriate feature extraction and selection methods improves the accuracy of the BCI system. Results show a promising approach outperforming many recently proposed methods.

References