EEG Preprocessing Pipeline Guide

Purpose

EEG preprocessing transforms raw electrophysiological recordings into clean data suitable for analysis. Unlike generic signal processing, every preprocessing decision in EEG involves domain-specific trade-offs: filtering at the wrong cutoff distorts ERP component morphology, choosing the wrong reference scheme biases topographic maps, and automated artifact rejection with incorrect parameters either leaves artifacts in the data or removes real neural signal.

A competent programmer without EEG training would not know that a 1 Hz high-pass filter is needed before ICA but distorts slow ERP components, that average reference requires a minimum of 64 channels, or that the order of preprocessing steps matters critically. This skill encodes the domain judgment required to build a correct EEG preprocessing pipeline.

When to Use This Skill

Setting up an EEG preprocessing pipeline for ERP, time-frequency, or connectivity analysis
Choosing filter parameters for specific analysis goals
Deciding between ICA and ASR for artifact removal
Selecting an appropriate re-referencing scheme
Performing quality control on preprocessed EEG data
Reviewing or troubleshooting an existing EEG preprocessing pipeline

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

State the research question — What specific question is this analysis/paradigm addressing?
Justify the method choice — Why is this approach appropriate? What alternatives were considered?
Declare expected outcomes — What results would support vs. refute the hypothesis?
Note assumptions and limitations — What does this method assume? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Standard Preprocessing Pipeline Order

The recommended order of preprocessing steps, based on established best practices (Luck, 2014; Onton & Makeig, 2006; Bigdely-Shamlo et al., 2015):

1. Import and inspect raw data
2. Remove (mark) bad channels
3. High-pass filter
4. Line noise removal
5. Re-reference
6. ICA decomposition and artifact removal
7. Interpolate bad channels
8. Epoch and baseline correct
9. Epoch rejection by amplitude threshold

Critical ordering constraints:

ICA must come after high-pass filtering (Step 3) because low-frequency drift degrades ICA decomposition (Winkler et al., 2015)
Bad channel removal (Step 2) must precede ICA because bad channels degrade component estimation
Bad channel interpolation (Step 7) must come after ICA to avoid ICA learning the interpolated data
Re-referencing (Step 5) should precede ICA so that all components are in a common reference frame

Step 1: Import and Inspect Raw Data

Convert to a standard format (EDF, BDF, or tool-native) if needed
Visually scroll through the entire recording to identify gross artifacts (disconnected electrodes, large muscle bursts, saturated channels)
Note segments with excessive noise for later rejection
Check sampling rate: 250-512 Hz typical for ERP; 1000+ Hz for high-frequency oscillatory analysis (Cohen, 2014)

Step 2: Remove Bad Channels

Bad channels contribute noise to re-referencing, ICA, and spatial interpolation. Identify them before other steps.

Identification Criteria

Criterion	Threshold	Source
Flat signal (zero variance)	Variance < 0.5 uV^2 for > 5 s	Bigdely-Shamlo et al., 2015
Excessive noise	Channel variance > 3 SD above the mean of all channels	Bigdely-Shamlo et al., 2015
Low correlation with neighbors	Mean correlation with neighboring channels < 0.4	Bigdely-Shamlo et al., 2015
Excessive line noise	50/60 Hz power > 4 SD above the mean	PREP pipeline (Bigdely-Shamlo et al., 2015)

Practical Limits

Remove no more than 10% of channels (e.g., 6 of 64). If more channels are bad, consider re-collecting data (Keil et al., 2014)
Mark bad channels for later interpolation (Step 7); do not interpolate yet

Step 3: High-Pass Filtering

High-pass filtering removes slow drifts from skin potentials, electrode drift, and movement artifacts.

Analysis Goal	Cutoff Frequency	Filter Type	Source
ERP analysis	0.1 Hz	FIR zero-phase	Luck, 2014; Tanner et al., 2015
ICA decomposition	1 Hz	FIR zero-phase	Winkler et al., 2015
Time-frequency analysis	0.1 Hz	FIR zero-phase	Cohen, 2014
Slow cortical potentials	0.01 Hz	FIR zero-phase	Luck, 2014

Critical domain knowledge: For ERP studies, use 0.1 Hz for the final analysis data but 1 Hz for the ICA decomposition step. The recommended workflow is:

Filter a copy of the data at 1 Hz for ICA
Run ICA on the 1 Hz-filtered copy
Apply the ICA weights (unmixing matrix) to the original 0.1 Hz-filtered data
This preserves slow ERP components while giving ICA clean decomposition (Winkler et al., 2015)

Why not 1 Hz for ERPs? A 1 Hz high-pass filter distorts ERP waveforms by introducing artificial pre-stimulus baseline shifts and reducing the amplitude of sustained components like the sustained negativity or the P3b (Tanner et al., 2015; Acunzo et al., 2012).

Filter Specifications

Parameter	Recommendation	Source
Filter type	FIR (Finite Impulse Response), zero-phase	Widmann et al., 2015
Design	Windowed sinc (Hamming or Blackman window)	Widmann et al., 2015
Transition bandwidth	2x the cutoff frequency (e.g., 0.2 Hz for a 0.1 Hz cutoff), or the EEGLAB/MNE default	Widmann et al., 2015
Filter order	Determined by transition bandwidth; typically 3x sampling rate / transition bandwidth	Widmann et al., 2015
Phase distortion	Zero (use filtfilt or FIR zero-phase); never use causal filtering for offline analysis	Widmann et al., 2015

Domain warning: IIR (Butterworth) filters introduce phase distortion that shifts ERP peak latencies. Always use FIR zero-phase filters for ERP analysis unless there is a specific reason for causal filtering (Widmann et al., 2015).

Step 4: Line Noise Removal

Remove power line noise at 50 Hz (Europe, Asia) or 60 Hz (Americas) and harmonics.

Method	Description	When to Use	Source
Notch filter	Band-stop filter at 50/60 Hz	Simple but removes neural signal at that frequency	Not recommended for oscillatory analysis
CleanLine	Adaptive frequency-domain regression	Preferred for most analyses; preserves neural signal near 50/60 Hz	Mullen et al., 2012
ZapLine	Removes line noise via DSS decomposition	Alternative to CleanLine; effective for MEG and EEG	de Cheveigne, 2020
Spectral interpolation	Interpolates the notched frequency band	Preserves spectral continuity	Leske & Dalal, 2019

Recommendation: Use CleanLine or ZapLine over notch filters. Notch filters create spectral distortion ("ringing") and remove real neural oscillatory power in the gamma band near 50/60 Hz (Muthukumaraswamy, 2013).

Step 5: Re-Referencing

EEG signals are always measured as potential differences relative to a reference. The choice of reference affects all downstream analyses.

Reference Scheme	When to Use	Requirements	Source
Average reference	Default for dense arrays	Minimum 64 channels with good head coverage	Dien, 1998; Luck, 2014
Linked mastoids	Low-density arrays (< 64 ch)	Both mastoid electrodes clean	Luck, 2014
Cz reference	During ICA only (if Cz was recording reference)	--	Convention
REST (Reference Electrode Standardization Technique)	Theoretical zero-reference approximation	Requires forward model	Yao, 2001
Infinity reference	Approximation of neutral reference	Forward model, dense arrays	Yao, 2001

Decision logic:

How many clean channels do you have?
 |
 +-- >= 64 with good head coverage
 | --> Average reference (Dien, 1998)
 |
 +-- 32-63 channels
 | --> Linked mastoids or average reference
 | (average reference becomes unreliable with sparse coverage)
 |
 +-- < 32 channels
 --> Linked mastoids (Luck, 2014)

Domain warning: Average reference assumes dense, uniform electrode coverage of the head. With sparse arrays (< 64 channels) or missing channels, the average reference is biased and can distort topographies (Dien, 1998).

Step 6: ICA Decomposition and Artifact Removal

Independent Component Analysis (ICA) separates the EEG signal into statistically independent spatial components, allowing identification and removal of artifact sources (Onton & Makeig, 2006).

ICA Algorithm Selection

Algorithm	Pros	Cons	Source
Infomax (runica)	Standard, well-validated; most commonly used	Assumes sub-Gaussian sources	Bell & Sejnowski, 1995
Extended Infomax	Handles both sub- and super-Gaussian sources	Slightly slower	Lee et al., 1999
AMICA	Most accurate decomposition; models multiple models	Very slow; requires more data	Palmer et al., 2012
FastICA	Fast computation	Less stable; sensitive to initialization	Hyvarinen, 1999
PICARD	Fast, robust convergence	Newer, less validated	Ablin et al., 2018

Recommendation: Use Extended Infomax (default in EEGLAB) or PICARD (default in MNE-Python) for most analyses. AMICA is preferred for high-quality research when computation time is not a constraint.

Data Requirements for ICA

Minimum data points: At least 20 * n_channels^2 data points for stable decomposition (Onton & Makeig, 2006). For 64 channels: 20 * 64^2 = 81,920 samples (~5.3 minutes at 256 Hz)
High-pass filter at 1 Hz before ICA (Winkler et al., 2015)
Remove bad channels before ICA (bad channels produce bad components)

Artifact Component Identification

Automated Classification: ICLabel (Pion-Tonachini et al., 2019)

ICLabel classifies ICA components into 7 categories with probability estimates:

Category	Action	Typical Count
Brain	Keep	Most components
Eye (blink)	Remove	1-2 components
Eye (lateral)	Remove	0-1 components
Muscle	Remove if probability > 0.8	0-3 components
Heart	Remove if probability > 0.8	0-1 components
Line noise	Remove if probability > 0.8	0-1 components
Channel noise	Remove if probability > 0.8	0-2 components

Recommended threshold: Remove components classified as non-brain with probability > 0.80 (conservative) or > 0.50 (liberal) (Pion-Tonachini et al., 2019).

Manual Identification Criteria

Artifact Type	Topography	Time Course	Power Spectrum
Blink	Frontal maximum, bilateral	Sharp transients (~300 ms)	High power at low frequencies (< 5 Hz)
Saccade	Frontal, lateralized (left-right asymmetry)	Step-like deflections	Low-frequency dominated
Cardiac	Broad, diffuse or left-lateralized	Periodic (~1 Hz)	Peak at ~1 Hz
Muscle	Peripheral (temporal, neck electrodes)	High-frequency broadband noise	Elevated power > 20 Hz

Domain insight: Typically remove 1-3 components for eye artifacts and 0-2 for other artifact types. Removing more than 5-6 components total risks removing neural signal. If many components appear artifactual, the data quality may be too poor for reliable analysis (Onton & Makeig, 2006).

Alternative: ASR (Artifact Subspace Reconstruction)

ASR is a real-time-capable method that identifies and reconstructs artifact-contaminated data segments (Mullen et al., 2015).

Parameter	Default	Conservative	Liberal	Source
Burst criterion (SD)	20	10-15	25-30	Mullen et al., 2015; Chang et al., 2020
Window length	0.5 s	0.5 s	1.0 s	Mullen et al., 2015
Max rejected channels (proportion)	0.3	0.2	0.4	Mullen et al., 2015

When to use ASR vs. ICA:

Is data heavily contaminated with non-stationary artifacts?
 |
 +-- YES --> ASR first (for gross artifact removal), then ICA for residual eye artifacts
 |
 +-- NO --> ICA alone is usually sufficient

Domain insight: ASR and ICA can be combined. Apply ASR first to remove large transient artifacts (burst criterion = 20 SD), then run ICA on the ASR-cleaned data for residual artifact removal (Chang et al., 2020).

Step 7: Interpolate Bad Channels

After ICA, interpolate the bad channels identified in Step 2.

Method: Spherical spline interpolation (Perrin et al., 1989)
Maximum interpolation: No more than 10% of channels (Keil et al., 2014)
Order: Interpolate after ICA so that ICA does not learn interpolated (non-independent) data
Verify: Check that interpolated channel time courses are consistent with neighbors

Step 8: Epoch and Baseline Correct

Epoch time window: Typically -200 to 800 ms for ERP; adjust based on component of interest (Luck, 2014)
Baseline window: -200 to 0 ms pre-stimulus (standard for ERP; Luck, 2014)
Baseline correction: Subtract the mean of the baseline window from each time point in the epoch

Analysis Type	Epoch Window	Baseline Window	Source
Standard ERP	-200 to 800 ms	-200 to 0 ms	Luck, 2014
Late ERP (P600, LPP)	-200 to 1000 ms	-200 to 0 ms	Luck, 2014
MMN	-100 to 400 ms	-100 to 0 ms	Naatanen et al., 2007
Time-frequency	-1000 to 2000 ms	-500 to -200 ms (or single-trial normalization)	Cohen, 2014

Domain warning: For time-frequency analysis, use a longer baseline period (-500 to -200 ms) and avoid the immediate pre-stimulus period to prevent contamination by anticipatory activity. Alternatively, use single-trial baseline normalization (Cohen, 2014).

Step 9: Epoch Rejection by Amplitude Threshold

After ICA has removed stereotyped artifacts, apply amplitude-based rejection to catch remaining transient artifacts.

Criterion	Threshold	Source
Peak-to-peak amplitude	Reject if > 100-150 uV	Luck, 2014
Absolute amplitude	Reject if any sample exceeds +/- 75-100 uV	Luck, 2014
Flat epoch	Reject if max - min < 0.5 uV (dead channel/epoch)	Bigdely-Shamlo et al., 2015
Step function (for eye blinks missed by ICA)	Reject if > 80 uV step in 200 ms moving window	Luck, 2014

Quality Benchmarks

Metric	Acceptable	Concerning	Source
Proportion of epochs rejected	< 25%	> 30% indicates poor data quality	Keil et al., 2014
Minimum retained trials per condition	30+	< 20 is unreliable for ERPs	Boudewyn et al., 2018
Minimum retained trials (absolute floor)	15	< 10 is unusable	Luck, 2014

Low-Pass Filtering (Optional, Post-Epoching)

Analysis Type	Low-Pass Cutoff	Source
ERP (visualization and analysis)	30 Hz	Luck, 2014
ERP (preserving high-frequency info)	40 Hz	Luck, 2014
Oscillatory (alpha, beta)	No low-pass or 100 Hz	Cohen, 2014
Oscillatory (gamma)	No low-pass or 200 Hz	Cohen, 2014

Domain warning: Low-pass filtering should be done after epoching to avoid edge artifacts. For ERP grand averages, a 20-30 Hz low-pass is common for visualization but should not be applied before statistical analysis of peak amplitudes/latencies, as it can shift peaks (Luck, 2014).

Common Pitfalls

Using 1 Hz high-pass for ERP analysis: A 1 Hz cutoff distorts slow ERP components. Use 0.1 Hz for final data; apply 1 Hz only for ICA training (Tanner et al., 2015; Acunzo et al., 2012)
Average reference with too few channels: Average reference with < 64 channels and incomplete head coverage biases topographies (Dien, 1998)
Running ICA on unfiltered data: Low-frequency drift degrades ICA decomposition quality. Always high-pass at 1 Hz before ICA (Winkler et al., 2015)
Removing too many ICA components: Removing > 5-6 components risks removing neural signal. If many components are artifactual, the data quality is too poor (Onton & Makeig, 2006)
Interpolating before ICA: Interpolated channels are linear combinations of neighbors, violating ICA's independence assumption. Interpolate after ICA (Luck, 2014)
Using IIR (Butterworth) filters: IIR filters introduce phase distortion that shifts ERP peak latencies. Use FIR zero-phase filters (Widmann et al., 2015)
Not checking the number of retained trials: If artifact rejection removes > 25% of trials, reconsider data quality or preprocessing parameters (Keil et al., 2014)
Applying notch filters for oscillatory analysis: Notch filters remove real neural gamma activity near 50/60 Hz. Use CleanLine or ZapLine instead (Muthukumaraswamy, 2013)

Minimum Reporting Checklist

Based on Keil et al. (2014) and Luck (2014):

References

Ablin, P., Cardoso, J. F., & Gramfort, A. (2018). Faster independent component analysis by preconditioning with Hessian approximations. IEEE Transactions on Signal Processing, 66(15), 4040-4049.
Acunzo, D. J., MacKenzie, G., & van Rossum, M. C. W. (2012). Systematic biases in early ERP and ERF components as a result of high-pass filtering. Journal of Neuroscience Methods, 209(1), 212-218.
Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129-1159.
Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9, 16.
Boudewyn, M. A., Luck, S. J., Farrens, J. L., & Kappenman, E. S. (2018). How many trials does it take to get a significant ERP effect? Psychophysiology, 55(6), e13049.
Chang, C. Y., Hsu, S. H., Pion-Tonachini, L., & Jung, T. P. (2020). Evaluation of artifact subspace reconstruction for automatic artifact components removal in multi-channel EEG recordings. IEEE Transactions on Biomedical Engineering, 67(4), 1114-1121.
Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. MIT Press.
de Cheveigne, A. (2020). ZapLine: A simple and effective method to remove power line artifacts. NeuroImage, 207, 116356.
Dien, J. (1998). Issues in the application of the average reference. Behavior Research Methods, Instruments, & Computers, 30(3), 449-457.
Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626-634.
Keil, A., Debener, S., Gratton, G., et al. (2014). Committee report: Publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1-21.
Lee, T. W., Girolami, M., & Sejnowski, T. J. (1999). Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources. Neural Computation, 11(2), 417-441.
Leske, S., & Dalal, S. S. (2019). Reducing power line noise in EEG and MEG data via spectrum interpolation. NeuroImage, 189, 763-776.
Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.). MIT Press.
Mullen, T. R., Kothe, C. A. E., Chi, Y. M., et al. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Transactions on Biomedical Engineering, 62(11), 2553-2567.
Muthukumaraswamy, S. D. (2013). High-frequency brain activity and muscle artifacts in MEG/EEG. Clinical Neurophysiology, 124(8), 1418-1426.
Naatanen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN). Clinical Neurophysiology, 118(12), 2544-2590.
Onton, J., & Makeig, S. (2006). Information-based modeling of event-related brain dynamics. Progress in Brain Research, 159, 99-120.
Palmer, J. A., Kreutz-Delgado, K., & Makeig, S. (2012). AMICA: An adaptive mixture of independent component analyzers with shared components. Technical Report, Swartz Center for Computational Neuroscience.
Perrin, F., Pernier, J., Bertrand, O., & Echallier, J. F. (1989). Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology, 72(2), 184-187.
Pion-Tonachini, L., Kreutz-Delgado, K., & Makeig, S. (2019). ICLabel: An automated electroencephalographic independent component classifier, dataset, and website. NeuroImage, 198, 181-197.
Tanner, D., Morgan-Short, K., & Luck, S. J. (2015). How inappropriate high-pass filters can produce artifactual effects and incorrect conclusions in ERP studies of language and cognition. Psychophysiology, 52(8), 997-1009.
Widmann, A., Schroger, E., & Maess, B. (2015). Digital filter design for electrophysiological data -- A practical approach. Journal of Neuroscience Methods, 250, 34-46.
Winkler, I., Debener, S., Muller, K. R., & Tangermann, M. (2015). On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Proceedings of EMBC, 4101-4105.
Yao, D. (2001). A method to standardize a reference of scalp EEG recordings to a point at infinity. Physiological Measurement, 22(4), 693-711.

See references/ for step-by-step pipeline code templates and parameter lookup tables.