fMRI GLM Analysis Guide
Purpose
The General Linear Model (GLM) is the standard statistical framework for task-based fMRI analysis. It models the observed BOLD time series as a linear combination of expected signal components (task regressors convolved with the hemodynamic response function) plus confound regressors plus noise (Poline & Brett, 2012; Poldrack et al., 2011, Ch. 4).
This skill encodes the domain-specific judgment needed to correctly specify a GLM for fMRI data. A competent programmer without neuroimaging training would get many of these decisions wrong -- choosing the wrong HRF model, setting an inappropriate high-pass filter cutoff, omitting critical confound regressors, or applying invalid statistical thresholds. Each decision described here requires understanding the biophysics of BOLD signal, the noise characteristics of fMRI data, and the statistical assumptions of the model.
When to Use This Skill
- Specifying a first-level (single-subject) GLM for task fMRI
- Choosing HRF models, confound regressors, and temporal filtering parameters
- Defining contrasts to test experimental hypotheses
- Setting up second-level (group) analyses
- Selecting appropriate multiple comparison correction methods
- Reviewing or troubleshooting an existing fMRI analysis pipeline
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question — What specific brain-behavior hypothesis is this GLM testing?
- Justify the method choice — Why task fMRI with GLM? What alternatives (resting-state, EEG, behavior-only) were considered?
- Declare expected outcomes — Which regions/contrasts do you expect to show effects, and in what direction?
- Note assumptions and limitations — What does the GLM assume? Where could it mislead?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
First-Level Analysis Decision Tree
Step 1: HRF Model Selection
The hemodynamic response function (HRF) models the neurovascular coupling delay between neural activity and the measured BOLD signal. The canonical HRF peaks at approximately 5-6 seconds post-stimulus (Glover, 1999).
| HRF Model | When to Use | Trade-off |
|---|---|---|
| Canonical (double-gamma) | Default for most task designs when timing is well-established | Assumes fixed HRF shape; highest statistical power (1 parameter per condition) (Lindquist et al., 2009) |
| Canonical + temporal derivative | When peak latency may vary by ~1 s across conditions or regions | Captures timing shifts; 2 parameters per condition (Friston et al., 1998; Henson et al., 2002) |
| Canonical + temporal + dispersion derivatives | When both latency and width of HRF may vary | Maximum flexibility with basis functions; 3 parameters per condition, reduced power (Henson et al., 2002) |
| Finite Impulse Response (FIR) | When HRF shape is unknown or expected to deviate substantially from canonical form | No shape assumption; many parameters (one per time bin); requires many trials for stable estimation (Glover, 1999; Dale, 1999) |
Decision logic:
Is the HRF shape well-established for your task and population?
|
+-- YES --> Is timing precision critical to your hypothesis?
| |
| +-- YES --> Canonical + temporal derivative
| |
| +-- NO --> Canonical HRF (default)
|
+-- NO --> Do you have enough trials (>40 per condition) for stable estimation?
|
+-- YES --> FIR model (exploratory) or canonical + derivatives
|
+-- NO --> Canonical + temporal derivative (safest compromise)
Domain warning: When using derivative basis functions, the contrast for the main condition should weight only the canonical regressor (not the derivatives). An F-test across all basis functions tests whether any component differs from zero (Calhoun et al., 2004). See references/design-matrix-guide.md for details.
Step 2: High-Pass Filter Cutoff
Low-frequency drifts from scanner instability, subject physiology, and slow head motion must be removed. This is implemented as a discrete cosine transform (DCT) basis set added to the design matrix (Poldrack et al., 2011, Ch. 5).
- Default cutoff period: 128 seconds (0.0078 Hz) -- the SPM default, appropriate for most event-related and block designs (Poldrack et al., 2011, Ch. 5)
- Rule: The cutoff period must be at least 2x the longest interval between events of the same condition type (Poldrack et al., 2011, Ch. 5)
- Block designs: For a block design with 30-second blocks alternating between two conditions, the full cycle is 60 seconds. A 128-second cutoff safely preserves this signal
- Slow event-related designs: If trials of the same condition occur every 90 seconds, a 128-second cutoff is too aggressive. Use at least 180 seconds (2 x 90)
Domain warning: Setting the cutoff too low (long period) leaves drift in the data, inflating noise. Setting it too high (short period) attenuates your experimental signal. Always verify that your design's fundamental frequency is preserved by the filter.
Step 3: Confound Regression
Confound regressors model variance from non-neural sources. Omitting them inflates false positive rates; including too many reduces statistical power.
Motion Parameters
Head motion is the single largest source of structured artifact in fMRI (Power et al., 2012).
| Model | Parameters | When to Use | Source |
|---|---|---|---|
| Standard 6-parameter | 3 translation + 3 rotation | Minimum acceptable model | Friston et al., 1996 |
| 24-parameter (Friston) | 6 current + 6 prior timepoint + 12 squared terms | Recommended default for task fMRI | Friston et al., 1996 |
| 6-parameter + derivatives | 6 current + 6 temporal derivatives | Intermediate model | Satterthwaite et al., 2013 |
Domain insight: The 24-parameter model (Friston et al., 1996) captures both linear and nonlinear motion effects, including spin-history artifacts from previous-timepoint head positions. The squared terms model the nonlinear relationship between motion and BOLD signal changes.
Framewise Displacement and Spike Regressors
- Calculate framewise displacement (FD) as the sum of absolute values of the six motion derivatives, with rotations converted to mm on a 50-mm radius sphere (Power et al., 2012)
- Flag volumes with FD > 0.5 mm as motion spikes (Power et al., 2012); for more stringent control, use FD > 0.2 mm (Power et al., 2014)
- Model each spike as a separate indicator regressor (one column per flagged volume). This is mathematically equivalent to censoring/scrubbing but preserves the temporal structure of the design matrix
Physiological Noise: CompCor
When physiological recordings (pulse, respiration) are unavailable:
- aCompCor (anatomical CompCor): Extract top 5-6 principal components from white matter and CSF masks (Behzadi et al., 2007; Muschelli et al., 2014)
- tCompCor (temporal CompCor): Extract components from high-temporal-variance voxels (Behzadi et al., 2007)
- aCompCor is generally preferred because it uses anatomically defined noise ROIs and is less susceptible to removing neural signal (Ciric et al., 2017)
Domain warning -- global signal regression: Regressing out the global mean signal is controversial. It improves motion artifact removal but introduces artifactual anticorrelations in functional connectivity analyses (Murphy & Fox, 2017). For task-based GLM, global signal regression is generally not recommended unless specifically justified.
Step 4: Serial Autocorrelation Correction
fMRI time series exhibit temporal autocorrelation due to hemodynamic smoothing and physiological noise. Ignoring this inflates t-statistics and false positive rates (Woolrich et al., 2001).
| Method | Implementation | Software |
|---|---|---|
| AR(1) prewhitening | Models autocorrelation as first-order autoregressive process | SPM (default), Nilearn |
| ARMA(1,1) | Autoregressive moving-average; more flexible | AFNI (3dREMLfit) |
| Tukey taper prewhitening | Nonparametric spectral smoothing of autocorrelation | FSL FILM (Woolrich et al., 2001) |
Domain insight: Recent work has shown that AR(1) may be insufficient for modern multiband acquisitions with short TRs (< 1 s), where higher-order autocorrelation structure is present (Olszowy et al., 2019). For short-TR data, consider ARMA(1,1) or FSL's FILM approach.
Step 5: Construct and Verify the Design Matrix
Before fitting the model:
- Visualize the design matrix -- Check for empty or near-empty columns, which indicate conditions with no events or events lost to the high-pass filter
- Check condition-regressor correlation -- Correlation > 0.5 between task regressors indicates collinearity, reducing power and making contrasts uninterpretable (Mumford et al., 2015)
- Verify the high-pass filter preserves signal -- Overlay the filter cutoff on the power spectrum of each task regressor
- Inspect the variance inflation factor (VIF) -- VIF > 5 for any regressor indicates problematic multicollinearity (Poldrack et al., 2011, Ch. 4)
See references/design-matrix-guide.md for detailed guidance on design matrix construction.
Contrast Specification
Contrasts define the specific hypotheses you test within the fitted GLM.
t-Contrasts (Directional)
A t-contrast is a single row vector of weights applied to the parameter estimates. It tests a directional hypothesis.
| Contrast Type | Weight Vector Example | Tests |
|---|---|---|
| Activation vs. baseline | [1 0 0 ...] | Is condition A > 0? |
| Condition difference | [1 -1 0 ...] | Is condition A > condition B? |
| Linear trend | [-1 0 1 ...] | Does activation increase linearly across 3 levels? |
| Interaction (2x2) | [1 -1 -1 1 ...] | Does the difference A1-A2 differ from B1-B2? |
Rules for valid t-contrasts (Poline & Brett, 2012):
- Weights must sum to zero for difference contrasts
- Only task regressors should receive non-zero weights (never confound regressors)
- When using derivative basis functions, contrast only the canonical regressor
F-Contrasts (Omnibus)
An F-contrast tests whether any of several effects are non-zero. It is specified as a matrix (multiple rows).
| Use Case | When to Use |
|---|---|
| Main effect of factor | Testing whether any level of a factor differs from any other |
| HRF model with derivatives | Testing whether the canonical + derivative basis set captures any response |
| Any-difference test | Testing whether any condition differs from baseline |
Domain insight: An F-test for the full basis set (canonical + derivatives) tests whether there is any evoked response, regardless of its exact timing or shape. This is more sensitive than a t-test on the canonical regressor alone when the true HRF deviates from canonical form (Calhoun et al., 2004).
Second-Level Analysis (Group Inference)
First-level contrast maps (one per subject) serve as input to the group model.
Fixed Effects vs. Mixed Effects
| Approach | Models | Generalizability | When to Use | Source |
|---|---|---|---|---|
| Fixed effects (FFX) | Within-subject variance only | Only to the scanned subjects | Multiple runs within one subject | Poldrack et al., 2011, Ch. 8 |
| Mixed effects (MFX) | Within- + between-subject variance | To the population | Group inference across subjects | Mumford & Nichols, 2009 |
| OLS (summary statistics) | Between-subject variance only | To the population (if homogeneity holds) | Standard group analysis; valid and near-optimal under moderate variance heterogeneity | Mumford & Nichols, 2009 |
Decision logic:
Are you combining runs within one subject?
|
+-- YES --> Fixed effects (concatenation or run-by-run with FFX)
|
+-- NO --> Are you making group-level inferences?
|
+-- YES --> Mixed effects (FLAME in FSL) or OLS summary statistics
OLS is valid and near-optimal for balanced designs
(Mumford & Nichols, 2009)
Domain warning: Using a fixed-effects analysis for group inference treats between-subject variability as zero, dramatically inflating false positive rates. Results would apply only to the specific subjects scanned, not to the population (Friston et al., 2005; Mumford & Nichols, 2009).
Common Second-Level Designs
- One-sample t-test: Design matrix = intercept only (column of ones). Tests group mean activation against zero
- Two-sample t-test: Compares two groups (e.g., patients vs. controls). Include group-mean confounds (age, motion) as covariates
- Paired t-test: Within-subject comparison of two conditions at the group level
- ANOVA / Flexible factorial: Multiple factors; requires careful specification of variance components
Multiple Comparison Correction
With approximately 100,000 voxels tested simultaneously, correction for multiple comparisons is essential. See references/statistical-inference.md for detailed guidance.
Quick Reference
| Method | Controls | Recommended Threshold | When to Use |
|---|---|---|---|
| Voxelwise FWE (RFT) | Family-wise error | p < 0.05 FWE | Highly localized effects expected (Worsley et al., 1996) |
| FDR (Benjamini-Hochberg) | False discovery rate | q < 0.05 | Distributed effects; moderate correction (Genovese et al., 2002) |
| Cluster-based (RFT) | Cluster-level FWE | CDT p < 0.001, then cluster p < 0.05 FWE | Standard approach; use CDT of p < 0.001 (Eklund et al., 2016) |
| TFCE | Voxelwise FWE via permutation | p < 0.05 FWE-corrected | No arbitrary CDT; good sensitivity (Smith & Nichols, 2009) |
| Permutation testing | FWE (nonparametric) | p < 0.05 FWE | Gold standard; no distributional assumptions (Nichols & Holmes, 2002) |
Critical domain knowledge: Cluster-based inference with a cluster-defining threshold (CDT) of p < 0.01 produces inflated false positive rates (up to 70% instead of the nominal 5%). Always use CDT of p < 0.001 or stricter (Eklund et al., 2016). For permutation tests, use at least 5,000-10,000 permutations for publication-quality results (Nichols & Holmes, 2002).
Common Pitfalls
- Double-dipping / circular analysis: Selecting voxels or ROIs based on the effect of interest, then testing that same effect in the selected data. This inflates effect sizes and produces invalid statistics (Kriegeskorte et al., 2009). Use independent data for selection and testing, or use a priori ROIs
- Collinear regressors: Highly correlated task regressors make individual contrasts uninterpretable. Check the correlation matrix of the design matrix; correlations > 0.5 between task regressors are problematic (Mumford et al., 2015)
- HRF misspecification: Using the canonical HRF when the true response deviates substantially (e.g., in pediatric or clinical populations) produces biased estimates and reduced power (Lindquist et al., 2009). Consider adding temporal derivatives or using FIR
- High-pass filter too aggressive: If the filter cutoff period is shorter than the longest inter-event interval, the task signal is attenuated. Always verify cutoff against your experimental timing
- Ignoring temporal autocorrelation: Using ordinary least squares (OLS) without prewhitening inflates t-statistics and false positive rates (Woolrich et al., 2001)
- Omitting motion regressors: Motion artifacts are the largest source of false positives in fMRI. At minimum, include 6 rigid-body parameters; the 24-parameter model is preferred (Power et al., 2012; Friston et al., 1996)
- Using uncorrected thresholds for inference: p < 0.001 uncorrected is insufficient for publication. Always apply FWE, FDR, or permutation-based correction (Eklund et al., 2016)
- Fixed-effects for group inference: Treats between-subject variance as zero; results do not generalize beyond the scanned sample (Mumford & Nichols, 2009)
Minimum Reporting Checklist
Based on the OHBM COBIDAS guidelines (Nichols et al., 2017) and Poldrack et al. (2008):
- Software package and version used for GLM estimation
- HRF model (canonical, derivatives, FIR) and any assumed parameters
- High-pass filter cutoff (in seconds or Hz) and implementation (DCT, Butterworth)
- Motion regressors included (6-parameter, 24-parameter, spike regressors)
- Additional confound regressors (CompCor components, global signal, etc.)
- Serial autocorrelation model (AR(1), ARMA(1,1), prewhitening approach)
- Contrast specification (weight vectors for each contrast tested)
- Second-level model type (fixed effects, mixed effects, OLS) and covariates
- Multiple comparison correction method, thresholds, and software
- For cluster-based inference: cluster-defining threshold and cluster-level threshold
- Smoothing kernel FWHM applied before group analysis (typically 6-8 mm for standard 2-mm voxels; Poldrack et al., 2011, Ch. 5)
References
- Behzadi, Y., Restom, K., Liau, J., & Liu, T. T. (2007). A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. NeuroImage, 37(1), 90-101.
- Calhoun, V. D., Stevens, M. C., Pearlson, G. D., & Kiehl, K. A. (2004). fMRI analysis with the general linear model: Removal of latency-induced amplitude bias by incorporation of hemodynamic derivative terms. NeuroImage, 22(1), 252-257.
- Ciric, R., Wolf, D. H., Power, J. D., et al. (2017). Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage, 154, 174-187.
- Dale, A. M. (1999). Optimal experimental design for event-related fMRI. Human Brain Mapping, 8(2-3), 109-114.
- Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS, 113(28), 7900-7905.
- Friston, K. J., Fletcher, P., Josephs, O., Holmes, A., Rugg, M. D., & Turner, R. (1998). Event-related fMRI: Characterizing differential responses. NeuroImage, 7(1), 30-40.
- Friston, K. J., Stephan, K. E., Lund, T. E., Morcom, A., & Kiebel, S. (2005). Mixed-effects and fMRI studies. NeuroImage, 24(1), 244-252.
- Friston, K. J., Williams, S., Howard, R., Frackowiak, R. S. J., & Turner, R. (1996). Movement-related effects in fMRI time-series. Magnetic Resonance in Medicine, 35(3), 346-355.
- Genovese, C. R., Lazar, N. A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage, 15(4), 870-878.
- Glover, G. H. (1999). Deconvolution of impulse response in event-related BOLD fMRI. NeuroImage, 9(4), 416-429.
- Henson, R. N. A., Price, C. J., Rugg, M. D., Turner, R., & Friston, K. J. (2002). Detecting latency differences in event-related BOLD responses. NeuroImage, 15(1), 83-97.
- Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular analysis in systems neuroscience: The dangers of double dipping. Nature Neuroscience, 12(5), 535-540.
- Lindquist, M. A. (2008). The statistical analysis of fMRI data. Statistical Science, 23(4), 439-464.
- Lindquist, M. A., Loh, J. M., Atlas, L. Y., & Wager, T. D. (2009). Modeling the hemodynamic response function in fMRI: Efficiency, bias and mis-modeling. NeuroImage, 45(1 Suppl), S187-S198.
- Mumford, J. A., & Nichols, T. E. (2009). Simple group fMRI modeling and inference. NeuroImage, 47(4), 1469-1475.
- Mumford, J. A., Poline, J. B., & Poldrack, R. A. (2015). Orthogonalization of regressors in fMRI models. PLoS ONE, 10(4), e0126255.
- Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154, 169-173.
- Muschelli, J., Nebel, M. B., Caffo, B. S., et al. (2014). Reduction of motion-related artifacts in resting state fMRI using aCompCor. NeuroImage, 96, 22-35.
- Nichols, T. E., Das, S., Eickhoff, S. B., et al. (2017). Best practices in data analysis and sharing in neuroimaging using MRI (COBIDAS). Nature Neuroscience, 20(3), 299-303.
- Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: A primer with examples. Human Brain Mapping, 15(1), 1-25.
- Olszowy, W., Aston, J., Rua, C., & Williams, G. B. (2019). Accurate autocorrelation modeling substantially improves fMRI reliability. Nature Communications, 10, 1220.
- Poldrack, R. A., Fletcher, P. C., Henson, R. N., Worsley, K. J., Brett, M., & Nichols, T. E. (2008). Guidelines for reporting an fMRI study. NeuroImage, 40(2), 409-414.
- Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of Functional MRI Data Analysis. Cambridge University Press.
- Poline, J. B., & Brett, M. (2012). The general linear model and fMRI: Does love last forever? NeuroImage, 62(2), 871-880.
- Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142-2154.
- Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320-341.
- Satterthwaite, T. D., Elliott, M. A., Gerraty, R. T., et al. (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage, 64, 240-256.
- Smith, S. M., & Nichols, T. E. (2009). Threshold-free cluster enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage, 44(1), 83-98.
- Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal autocorrelation in univariate linear modeling of FMRI data. NeuroImage, 14(6), 1370-1386.
- Worsley, K. J., Marrett, S., Neelin, P., Vandal, A. C., Friston, K. J., & Evans, A. C. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4(1), 58-73.
See references/ for detailed design matrix construction guide and statistical inference methods.