top of page

EEG Preprocessing

Preprocessing is a crucial step in EEG analysis. It allows us to isolate the brain activity from other signals the EEG receives, improve signal-to-noise ratio while emphasize events in your signal and remove unecessary artifact.

A Quick Overview

Intro
  • The importance of preprocessing

  • Data Acquisition and Loading

Preprocessig steps

​Data Cleaning: 

  • Filtering

  • Rerefrencing

  • Removing artifacts (ICA)

Code Tutorial

Step-by-step Python-based instructions for working with a 'Raw' MNE object.

Preprocessing
Why do we need it?

EEG is a powerful tool for studying brain activity, but raw EEG data is often contaminated with various artifacts and noise. These unwanted signals can significantly obscure the underlying brain signals of interest, making accurate analysis and interpretation challenging.

 

Therefore, preprocessing is a critical initial step in any EEG analysis pipeline. It involves a series of techniques designed to clean and prepare the data, ensuring that subsequent analyses are based on reliable and meaningful information.

Data Acquisition and Loading

Before diving into preprocessing, it's essential to understand the format of your EEG data and how to load it into a suitable Python environment.

1

Data Format

EEG data is typically stored in specialized file formats like European Data Format (EDF) or Brain-Computer Interface 2000 (BCI2000). These files contain information about the recorded EEG signals, including channel locations, sampling rate, and recording parameters.

3

Additional Essentials

In addition to “mne”, its important to import libraries like  “numpy”, “matplotlib.pyplot”, “scipy”/ “scipy.signal”, for the use of their mathematical functions.

You can start coding on Google Colab (using your Google account). 

5

More Details

You can see detailed guid for data loading in this link.

2

EEG Code Libraries

To work with EEG data in Python, we'll leverage powerful libraries like MNE-Python and pyEDFlib. MNE-Python is a comprehensive toolbox for EEG analysis, offering functions to handle various file formats and perform advanced signal processing tasks. PyEDFlib is another option specifically designed for reading and writing EDF files.

4

Get your own

It’s possible to download an open dataset for practice!

For example you can check out  in: PhysioNet (https://physionet.org/).
They offer a vast collection of publicly available biomedical and physiological signals, including EEG data.
For example you can check out the following:

PhysioNet.org >> Data >> 
EEG Motor Movement/Imagery Dataset.

Data Cleaning

Data cleaning is a crucial step in EEG preprocessing to enhance data quality and remove unwanted artifacts. Artifacts are spurious signals that can obscure the underlying brain activity. Common artifacts include eye blinks, muscle activity, line noise, and electrode noise.

Filtering

Filtering is applied to remove frequency components that are not of interest. Common filtering techniques include:

  • Notch filtering: Removes power-line interference (e.g., 50 Hz or 60 Hz).

  • Band-pass filtering: Isolates a specific frequency band related to brain activity (e.g., alpha, beta, theta).

Notch filter
A notch filter is applied when we want to eliminate  solely a certain frequency.  For example, an interference caused by 60 Hz power lines. 

 

Band-pass filter
Alternatively, we can preserve a range of frequencies, usually called “band pass”, meaning we’re filtering out all frequencies outside of this range.

By using “low-pass”, we retain frequencies below a certain threshold and exclude everything higher ("let" only the low "pass"). Conversely, a “high-pass” filter can be used to eliminate low-frequency components. Combining high and low-pass filters creates us a band-pass filter.

EEG Frequency Bands

EEG signals are typically divided into frequency bands based on their characteristics and cognitive correlates:

  • Delta (0.5-4 Hz): Associated with deep sleep.

  • Theta (4-8 Hz): Related to drowsiness, sleep stages, and memory.

  • Alpha (8-12 Hz): Present during relaxed wakefulness and closed-eye states.

  • Beta (12-30 Hz): Linked to active attention, motor control, and arousal.

  • Gamma (30-100 Hz): Associated with higher cognitive functions, sensory processing, and consciousness.

The specific frequency bands of interest will depend on your research question. For example, to study motor imagery, you might focus on the beta band.

Referencing

Referencing involves changing the reference electrode for the EEG data. This can be useful for reducing common-mode noise and enhancing signal-to-noise ratio.


Two common referencing techniques include:

  • Average reference: The average of all electrodes is used as a reference.

  • Bipolar referencing: The difference between two electrodes is used as a reference.

​

Average reference involves calculating the mean voltage across all electrodes at each time point and subtracting this value from each electrode's signal. This method assumes that the overall electrical activity of the head sums to zero, effectively creating a virtual reference point. While commonly used, average reference can be influenced by artifacts and may not accurately represent a true neutral reference.

​

Bipolar referencing involves subtracting the signal from one electrode from the signal of another. This approach highlights voltage differences between specific electrode pairs, which can be useful for studying localized brain activity. However, bipolar referencing can also amplify noise and reduce the overall signal-to-noise ratio.

 

The choice between average reference and bipolar referencing depends on the specific research question and the characteristics of the EEG data.

Concrete Wall

Removing Artifacts: ICA

Independent Component Analysis (ICA) is a statistical technique used to separate a mixed signal into its underlying independent components.

​

In the context of EEG, this means breaking down the complex EEG signal into a set of simpler signals that are statistically independent.

These independent components often represent different sources of brain activity or artifacts such as eye blinks or muscle movements. By applying ICA to EEG data, researchers can identify and remove unwanted artifacts, enhancing the quality of the signal for subsequent analysis. This process involves decomposing the EEG data into a set of independent components, examining these components to identify artifacts, and then reconstructing the EEG data without the identified artifacts.

Several techniques can be employed to mitigate their impact. These include:

  • Independent Component Analysis (ICA): This method decomposes the EEG signal into independent components, allowing for the identification and removal of artifact-related components.

  • Automatic artifact detection: Some algorithms can automatically detect and remove artifacts based on predefined criteria.

Coding Tutorial

This part will cover a hands-on practice using MNE python library.

Open a new Jupyter notebook/Google colab.
The following steps will help you preprocess an example dataset from the MNE library.

Note: this guid is based on MNE python overview, "Introduction to EEG-preprocessing" (of g0rella.github.io.).

 Importing Necessary Libraries

import mne

import numpy as np

import matplotlib.pyplot as plt

matplotlib.rcParams['figure.figsize'] = (5, 5)

path = mne.datasets.eegbci.load_data(3, 1)

path[0]

raw = mne.io.read_raw_edf(path[0], preload=True) raw.info['chs']

from mne.datasets import eegbci

eegbci.standardize(raw)

# setting standard montage (the arrangement of electrodes on the scalp)

montage = mne.channels.make_standard_montage('standard_1020') raw.set_montage(montage)
 

Load an example dataset from MNE, Create MNE Object

Plot and visualize the raw data

raw.plot_sensors(kind='3d')

raw.plot(scalings = dict(eeg=200e-6)) # setting +/- 200 µV scale

ch_names = ['A', 'B']

sfreq = 200 # sampling frequency (hertz)

info = mne.create_info(ch_names, sfreq) # See docs for full list of Info options. samples = np.array([[-1, 0, -1, 1, 1], [0, 1, 0, -1, 0]]) # Samples for each channel loaded

Raw = mne.io.RawArray(samples, info)

loadedRaw.plot(scalings='auto')

raw.plot_psd(tmin=0, tmax=60, fmin=2, fmax=80, average=True, spatial_colors=False)

Filtering

raw.notch_filter(freqs=[50])
# set high and low frequency due to your analysis, as the frequencies in this band will be the one to "pass"

raw.filter(l_freq=1, h_freq=30)​

raw.info['bads'] = ['C4'] picks = mne.pick_types(raw.info, exclude='bads') raw.plot(scalings=dict(eeg=200e-6), bad_color='red');

print(raw.info['chs'])

​raw.interpolate_bads(reset_bads=True) raw.plot(scalings=dict(eeg=200e-6));

Referencing

raw.set_eeg_reference('average') # you can use other references

ICA

from mne.preprocessing import ICA # play around with this number to get components # that seem to represent the actual brain activations well

num_components = 15

ica = ICA(n_components=num_components, method='fastica')

ica.fit(raw)

ica.plot_components();

raw.plot(n_channels=32, scalings=dict(eeg=100e-6));

ica.plot_properties(raw, picks=0);

ica.plot_properties(raw, picks=9);

ica.plot_overlay(raw, exclude=[0]);

ica.exclude = [0]
ica.apply(raw)

raw.plot(scalings=dict(eeg=200e-6));

bottom of page