EEG Preprocessing Pipeline Guide

Guides EEG preprocessing: filtering, artifact rejection (ICA/ASR), re-referencing, interpolation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "EEG Preprocessing Pipeline Guide" with this command: npx skills add haoxuanlithuai/awesome_cognitive_and_neuroscience_skills/haoxuanlithuai-awesome-cognitive-and-neuroscience-skills-eeg-preprocessing-pipeline-guide

EEG Preprocessing Pipeline Guide

Purpose

EEG preprocessing transforms raw electrophysiological recordings into clean data suitable for analysis. Unlike generic signal processing, every preprocessing decision in EEG involves domain-specific trade-offs: filtering at the wrong cutoff distorts ERP component morphology, choosing the wrong reference scheme biases topographic maps, and automated artifact rejection with incorrect parameters either leaves artifacts in the data or removes real neural signal.

A competent programmer without EEG training would not know that a 1 Hz high-pass filter is needed before ICA but distorts slow ERP components, that average reference requires a minimum of 64 channels, or that the order of preprocessing steps matters critically. This skill encodes the domain judgment required to build a correct EEG preprocessing pipeline.

When to Use This Skill

  • Setting up an EEG preprocessing pipeline for ERP, time-frequency, or connectivity analysis
  • Choosing filter parameters for specific analysis goals
  • Deciding between ICA and ASR for artifact removal
  • Selecting an appropriate re-referencing scheme
  • Performing quality control on preprocessed EEG data
  • Reviewing or troubleshooting an existing EEG preprocessing pipeline

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

  1. State the research question — What specific question is this analysis/paradigm addressing?
  2. Justify the method choice — Why is this approach appropriate? What alternatives were considered?
  3. Declare expected outcomes — What results would support vs. refute the hypothesis?
  4. Note assumptions and limitations — What does this method assume? Where could it mislead?
  5. Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Standard Preprocessing Pipeline Order

The recommended order of preprocessing steps, based on established best practices (Luck, 2014; Onton & Makeig, 2006; Bigdely-Shamlo et al., 2015):

1. Import and inspect raw data
2. Remove (mark) bad channels
3. High-pass filter
4. Line noise removal
5. Re-reference
6. ICA decomposition and artifact removal
7. Interpolate bad channels
8. Epoch and baseline correct
9. Epoch rejection by amplitude threshold

Critical ordering constraints:

  • ICA must come after high-pass filtering (Step 3) because low-frequency drift degrades ICA decomposition (Winkler et al., 2015)
  • Bad channel removal (Step 2) must precede ICA because bad channels degrade component estimation
  • Bad channel interpolation (Step 7) must come after ICA to avoid ICA learning the interpolated data
  • Re-referencing (Step 5) should precede ICA so that all components are in a common reference frame

Step 1: Import and Inspect Raw Data

  • Convert to a standard format (EDF, BDF, or tool-native) if needed
  • Visually scroll through the entire recording to identify gross artifacts (disconnected electrodes, large muscle bursts, saturated channels)
  • Note segments with excessive noise for later rejection
  • Check sampling rate: 250-512 Hz typical for ERP; 1000+ Hz for high-frequency oscillatory analysis (Cohen, 2014)

Step 2: Remove Bad Channels

Bad channels contribute noise to re-referencing, ICA, and spatial interpolation. Identify them before other steps.

Identification Criteria

CriterionThresholdSource
Flat signal (zero variance)Variance < 0.5 uV^2 for > 5 sBigdely-Shamlo et al., 2015
Excessive noiseChannel variance > 3 SD above the mean of all channelsBigdely-Shamlo et al., 2015
Low correlation with neighborsMean correlation with neighboring channels < 0.4Bigdely-Shamlo et al., 2015
Excessive line noise50/60 Hz power > 4 SD above the meanPREP pipeline (Bigdely-Shamlo et al., 2015)

Practical Limits

  • Remove no more than 10% of channels (e.g., 6 of 64). If more channels are bad, consider re-collecting data (Keil et al., 2014)
  • Mark bad channels for later interpolation (Step 7); do not interpolate yet

Step 3: High-Pass Filtering

High-pass filtering removes slow drifts from skin potentials, electrode drift, and movement artifacts.

Analysis GoalCutoff FrequencyFilter TypeSource
ERP analysis0.1 HzFIR zero-phaseLuck, 2014; Tanner et al., 2015
ICA decomposition1 HzFIR zero-phaseWinkler et al., 2015
Time-frequency analysis0.1 HzFIR zero-phaseCohen, 2014
Slow cortical potentials0.01 HzFIR zero-phaseLuck, 2014

Critical domain knowledge: For ERP studies, use 0.1 Hz for the final analysis data but 1 Hz for the ICA decomposition step. The recommended workflow is:

  1. Filter a copy of the data at 1 Hz for ICA
  2. Run ICA on the 1 Hz-filtered copy
  3. Apply the ICA weights (unmixing matrix) to the original 0.1 Hz-filtered data
  4. This preserves slow ERP components while giving ICA clean decomposition (Winkler et al., 2015)

Why not 1 Hz for ERPs? A 1 Hz high-pass filter distorts ERP waveforms by introducing artificial pre-stimulus baseline shifts and reducing the amplitude of sustained components like the sustained negativity or the P3b (Tanner et al., 2015; Acunzo et al., 2012).

Filter Specifications

ParameterRecommendationSource
Filter typeFIR (Finite Impulse Response), zero-phaseWidmann et al., 2015
DesignWindowed sinc (Hamming or Blackman window)Widmann et al., 2015
Transition bandwidth2x the cutoff frequency (e.g., 0.2 Hz for a 0.1 Hz cutoff), or the EEGLAB/MNE defaultWidmann et al., 2015
Filter orderDetermined by transition bandwidth; typically 3x sampling rate / transition bandwidthWidmann et al., 2015
Phase distortionZero (use filtfilt or FIR zero-phase); never use causal filtering for offline analysisWidmann et al., 2015

Domain warning: IIR (Butterworth) filters introduce phase distortion that shifts ERP peak latencies. Always use FIR zero-phase filters for ERP analysis unless there is a specific reason for causal filtering (Widmann et al., 2015).

Step 4: Line Noise Removal

Remove power line noise at 50 Hz (Europe, Asia) or 60 Hz (Americas) and harmonics.

MethodDescriptionWhen to UseSource
Notch filterBand-stop filter at 50/60 HzSimple but removes neural signal at that frequencyNot recommended for oscillatory analysis
CleanLineAdaptive frequency-domain regressionPreferred for most analyses; preserves neural signal near 50/60 HzMullen et al., 2012
ZapLineRemoves line noise via DSS decompositionAlternative to CleanLine; effective for MEG and EEGde Cheveigne, 2020
Spectral interpolationInterpolates the notched frequency bandPreserves spectral continuityLeske & Dalal, 2019

Recommendation: Use CleanLine or ZapLine over notch filters. Notch filters create spectral distortion ("ringing") and remove real neural oscillatory power in the gamma band near 50/60 Hz (Muthukumaraswamy, 2013).

Step 5: Re-Referencing

EEG signals are always measured as potential differences relative to a reference. The choice of reference affects all downstream analyses.

Reference SchemeWhen to UseRequirementsSource
Average referenceDefault for dense arraysMinimum 64 channels with good head coverageDien, 1998; Luck, 2014
Linked mastoidsLow-density arrays (< 64 ch)Both mastoid electrodes cleanLuck, 2014
Cz referenceDuring ICA only (if Cz was recording reference)--Convention
REST (Reference Electrode Standardization Technique)Theoretical zero-reference approximationRequires forward modelYao, 2001
Infinity referenceApproximation of neutral referenceForward model, dense arraysYao, 2001

Decision logic:

How many clean channels do you have?
 |
 +-- >= 64 with good head coverage
 | --> Average reference (Dien, 1998)
 |
 +-- 32-63 channels
 | --> Linked mastoids or average reference
 | (average reference becomes unreliable with sparse coverage)
 |
 +-- < 32 channels
 --> Linked mastoids (Luck, 2014)

Domain warning: Average reference assumes dense, uniform electrode coverage of the head. With sparse arrays (< 64 channels) or missing channels, the average reference is biased and can distort topographies (Dien, 1998).

Step 6: ICA Decomposition and Artifact Removal

Independent Component Analysis (ICA) separates the EEG signal into statistically independent spatial components, allowing identification and removal of artifact sources (Onton & Makeig, 2006).

ICA Algorithm Selection

AlgorithmProsConsSource
Infomax (runica)Standard, well-validated; most commonly usedAssumes sub-Gaussian sourcesBell & Sejnowski, 1995
Extended InfomaxHandles both sub- and super-Gaussian sourcesSlightly slowerLee et al., 1999
AMICAMost accurate decomposition; models multiple modelsVery slow; requires more dataPalmer et al., 2012
FastICAFast computationLess stable; sensitive to initializationHyvarinen, 1999
PICARDFast, robust convergenceNewer, less validatedAblin et al., 2018

Recommendation: Use Extended Infomax (default in EEGLAB) or PICARD (default in MNE-Python) for most analyses. AMICA is preferred for high-quality research when computation time is not a constraint.

Data Requirements for ICA

  • Minimum data points: At least 20 * n_channels^2 data points for stable decomposition (Onton & Makeig, 2006). For 64 channels: 20 * 64^2 = 81,920 samples (~5.3 minutes at 256 Hz)
  • High-pass filter at 1 Hz before ICA (Winkler et al., 2015)
  • Remove bad channels before ICA (bad channels produce bad components)

Artifact Component Identification

Automated Classification: ICLabel (Pion-Tonachini et al., 2019)

ICLabel classifies ICA components into 7 categories with probability estimates:

CategoryActionTypical Count
BrainKeepMost components
Eye (blink)Remove1-2 components
Eye (lateral)Remove0-1 components
MuscleRemove if probability > 0.80-3 components
HeartRemove if probability > 0.80-1 components
Line noiseRemove if probability > 0.80-1 components
Channel noiseRemove if probability > 0.80-2 components

Recommended threshold: Remove components classified as non-brain with probability > 0.80 (conservative) or > 0.50 (liberal) (Pion-Tonachini et al., 2019).

Manual Identification Criteria

Artifact TypeTopographyTime CoursePower Spectrum
BlinkFrontal maximum, bilateralSharp transients (~300 ms)High power at low frequencies (< 5 Hz)
SaccadeFrontal, lateralized (left-right asymmetry)Step-like deflectionsLow-frequency dominated
CardiacBroad, diffuse or left-lateralizedPeriodic (~1 Hz)Peak at ~1 Hz
MusclePeripheral (temporal, neck electrodes)High-frequency broadband noiseElevated power > 20 Hz

Domain insight: Typically remove 1-3 components for eye artifacts and 0-2 for other artifact types. Removing more than 5-6 components total risks removing neural signal. If many components appear artifactual, the data quality may be too poor for reliable analysis (Onton & Makeig, 2006).

Alternative: ASR (Artifact Subspace Reconstruction)

ASR is a real-time-capable method that identifies and reconstructs artifact-contaminated data segments (Mullen et al., 2015).

ParameterDefaultConservativeLiberalSource
Burst criterion (SD)2010-1525-30Mullen et al., 2015; Chang et al., 2020
Window length0.5 s0.5 s1.0 sMullen et al., 2015
Max rejected channels (proportion)0.30.20.4Mullen et al., 2015

When to use ASR vs. ICA:

Is data heavily contaminated with non-stationary artifacts?
 |
 +-- YES --> ASR first (for gross artifact removal), then ICA for residual eye artifacts
 |
 +-- NO --> ICA alone is usually sufficient

Domain insight: ASR and ICA can be combined. Apply ASR first to remove large transient artifacts (burst criterion = 20 SD), then run ICA on the ASR-cleaned data for residual artifact removal (Chang et al., 2020).

Step 7: Interpolate Bad Channels

After ICA, interpolate the bad channels identified in Step 2.

  • Method: Spherical spline interpolation (Perrin et al., 1989)
  • Maximum interpolation: No more than 10% of channels (Keil et al., 2014)
  • Order: Interpolate after ICA so that ICA does not learn interpolated (non-independent) data
  • Verify: Check that interpolated channel time courses are consistent with neighbors

Step 8: Epoch and Baseline Correct

  • Epoch time window: Typically -200 to 800 ms for ERP; adjust based on component of interest (Luck, 2014)
  • Baseline window: -200 to 0 ms pre-stimulus (standard for ERP; Luck, 2014)
  • Baseline correction: Subtract the mean of the baseline window from each time point in the epoch
Analysis TypeEpoch WindowBaseline WindowSource
Standard ERP-200 to 800 ms-200 to 0 msLuck, 2014
Late ERP (P600, LPP)-200 to 1000 ms-200 to 0 msLuck, 2014
MMN-100 to 400 ms-100 to 0 msNaatanen et al., 2007
Time-frequency-1000 to 2000 ms-500 to -200 ms (or single-trial normalization)Cohen, 2014

Domain warning: For time-frequency analysis, use a longer baseline period (-500 to -200 ms) and avoid the immediate pre-stimulus period to prevent contamination by anticipatory activity. Alternatively, use single-trial baseline normalization (Cohen, 2014).

Step 9: Epoch Rejection by Amplitude Threshold

After ICA has removed stereotyped artifacts, apply amplitude-based rejection to catch remaining transient artifacts.

CriterionThresholdSource
Peak-to-peak amplitudeReject if > 100-150 uVLuck, 2014
Absolute amplitudeReject if any sample exceeds +/- 75-100 uVLuck, 2014
Flat epochReject if max - min < 0.5 uV (dead channel/epoch)Bigdely-Shamlo et al., 2015
Step function (for eye blinks missed by ICA)Reject if > 80 uV step in 200 ms moving windowLuck, 2014

Quality Benchmarks

MetricAcceptableConcerningSource
Proportion of epochs rejected< 25%> 30% indicates poor data qualityKeil et al., 2014
Minimum retained trials per condition30+< 20 is unreliable for ERPsBoudewyn et al., 2018
Minimum retained trials (absolute floor)15< 10 is unusableLuck, 2014

Low-Pass Filtering (Optional, Post-Epoching)

Analysis TypeLow-Pass CutoffSource
ERP (visualization and analysis)30 HzLuck, 2014
ERP (preserving high-frequency info)40 HzLuck, 2014
Oscillatory (alpha, beta)No low-pass or 100 HzCohen, 2014
Oscillatory (gamma)No low-pass or 200 HzCohen, 2014

Domain warning: Low-pass filtering should be done after epoching to avoid edge artifacts. For ERP grand averages, a 20-30 Hz low-pass is common for visualization but should not be applied before statistical analysis of peak amplitudes/latencies, as it can shift peaks (Luck, 2014).

Common Pitfalls

  1. Using 1 Hz high-pass for ERP analysis: A 1 Hz cutoff distorts slow ERP components. Use 0.1 Hz for final data; apply 1 Hz only for ICA training (Tanner et al., 2015; Acunzo et al., 2012)
  2. Average reference with too few channels: Average reference with < 64 channels and incomplete head coverage biases topographies (Dien, 1998)
  3. Running ICA on unfiltered data: Low-frequency drift degrades ICA decomposition quality. Always high-pass at 1 Hz before ICA (Winkler et al., 2015)
  4. Removing too many ICA components: Removing > 5-6 components risks removing neural signal. If many components are artifactual, the data quality is too poor (Onton & Makeig, 2006)
  5. Interpolating before ICA: Interpolated channels are linear combinations of neighbors, violating ICA's independence assumption. Interpolate after ICA (Luck, 2014)
  6. Using IIR (Butterworth) filters: IIR filters introduce phase distortion that shifts ERP peak latencies. Use FIR zero-phase filters (Widmann et al., 2015)
  7. Not checking the number of retained trials: If artifact rejection removes > 25% of trials, reconsider data quality or preprocessing parameters (Keil et al., 2014)
  8. Applying notch filters for oscillatory analysis: Notch filters remove real neural gamma activity near 50/60 Hz. Use CleanLine or ZapLine instead (Muthukumaraswamy, 2013)

Minimum Reporting Checklist

Based on Keil et al. (2014) and Luck (2014):

  • Sampling rate (original and any downsampling applied)
  • High-pass filter cutoff, type (FIR/IIR), order, transition bandwidth
  • Low-pass filter cutoff, type, order (if applied)
  • Line noise removal method (notch, CleanLine, ZapLine)
  • Re-referencing scheme (average, linked mastoids, etc.) and when applied
  • Bad channel identification criteria and number removed
  • Bad channel interpolation method (spherical spline)
  • ICA algorithm used and number of components computed
  • Artifact component identification method (manual, ICLabel, ADJUST) and criteria
  • Number and type of components removed (mean and range across subjects)
  • ASR parameters if used (burst criterion, window length)
  • Epoch time window and baseline correction window
  • Epoch rejection criteria (thresholds) and proportion rejected (mean and range)
  • Minimum number of retained trials per condition
  • Software package and version (EEGLAB, MNE-Python, FieldTrip)

References

  • Ablin, P., Cardoso, J. F., & Gramfort, A. (2018). Faster independent component analysis by preconditioning with Hessian approximations. IEEE Transactions on Signal Processing, 66(15), 4040-4049.
  • Acunzo, D. J., MacKenzie, G., & van Rossum, M. C. W. (2012). Systematic biases in early ERP and ERF components as a result of high-pass filtering. Journal of Neuroscience Methods, 209(1), 212-218.
  • Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129-1159.
  • Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9, 16.
  • Boudewyn, M. A., Luck, S. J., Farrens, J. L., & Kappenman, E. S. (2018). How many trials does it take to get a significant ERP effect? Psychophysiology, 55(6), e13049.
  • Chang, C. Y., Hsu, S. H., Pion-Tonachini, L., & Jung, T. P. (2020). Evaluation of artifact subspace reconstruction for automatic artifact components removal in multi-channel EEG recordings. IEEE Transactions on Biomedical Engineering, 67(4), 1114-1121.
  • Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. MIT Press.
  • de Cheveigne, A. (2020). ZapLine: A simple and effective method to remove power line artifacts. NeuroImage, 207, 116356.
  • Dien, J. (1998). Issues in the application of the average reference. Behavior Research Methods, Instruments, & Computers, 30(3), 449-457.
  • Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626-634.
  • Keil, A., Debener, S., Gratton, G., et al. (2014). Committee report: Publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1-21.
  • Lee, T. W., Girolami, M., & Sejnowski, T. J. (1999). Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources. Neural Computation, 11(2), 417-441.
  • Leske, S., & Dalal, S. S. (2019). Reducing power line noise in EEG and MEG data via spectrum interpolation. NeuroImage, 189, 763-776.
  • Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.). MIT Press.
  • Mullen, T. R., Kothe, C. A. E., Chi, Y. M., et al. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Transactions on Biomedical Engineering, 62(11), 2553-2567.
  • Muthukumaraswamy, S. D. (2013). High-frequency brain activity and muscle artifacts in MEG/EEG. Clinical Neurophysiology, 124(8), 1418-1426.
  • Naatanen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN). Clinical Neurophysiology, 118(12), 2544-2590.
  • Onton, J., & Makeig, S. (2006). Information-based modeling of event-related brain dynamics. Progress in Brain Research, 159, 99-120.
  • Palmer, J. A., Kreutz-Delgado, K., & Makeig, S. (2012). AMICA: An adaptive mixture of independent component analyzers with shared components. Technical Report, Swartz Center for Computational Neuroscience.
  • Perrin, F., Pernier, J., Bertrand, O., & Echallier, J. F. (1989). Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology, 72(2), 184-187.
  • Pion-Tonachini, L., Kreutz-Delgado, K., & Makeig, S. (2019). ICLabel: An automated electroencephalographic independent component classifier, dataset, and website. NeuroImage, 198, 181-197.
  • Tanner, D., Morgan-Short, K., & Luck, S. J. (2015). How inappropriate high-pass filters can produce artifactual effects and incorrect conclusions in ERP studies of language and cognition. Psychophysiology, 52(8), 997-1009.
  • Widmann, A., Schroger, E., & Maess, B. (2015). Digital filter design for electrophysiological data -- A practical approach. Journal of Neuroscience Methods, 250, 34-46.
  • Winkler, I., Debener, S., Muller, K. R., & Tangermann, M. (2015). On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Proceedings of EMBC, 4101-4105.
  • Yao, D. (2001). A method to standardize a reference of scalp EEG recordings to a point at infinity. Physiological Measurement, 22(4), 693-711.

See references/ for step-by-step pipeline code templates and parameter lookup tables.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

self-paced reading designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

lesion-symptom mapping guide

No summary provided by upstream source.

Repository SourceNeeds Review
General

verify skill

No summary provided by upstream source.

Repository SourceNeeds Review
General

visual search array generator

No summary provided by upstream source.

Repository SourceNeeds Review