Neuroimaging Power Guide

Sample-size planning for fMRI/EEG studies using effect-size benchmarks and simulation-based power

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Neuroimaging Power Guide" with this command: npx skills add haoxuanlithuai/awesome_cognitive_and_neuroscience_skills/haoxuanlithuai-awesome-cognitive-and-neuroscience-skills-neuroimaging-power-guide

Neuroimaging Power Guide

Purpose

Statistical power in neuroimaging is fundamentally different from power in behavioral research. The massive multiple comparisons problem (testing ~100,000 voxels simultaneously), spatial correlation structure, and non-standard test statistics mean that standard power formulas underestimate required sample sizes. Meanwhile, the field has historically been severely underpowered: the median fMRI study has only ~20% power to detect a typical effect (Button et al., 2013).

A competent programmer without neuroimaging training would apply standard power calculations (e.g., G*Power for a t-test) without accounting for multiple comparison correction, would not know typical effect sizes in neuroimaging, and would dramatically underestimate the sample sizes needed. This skill encodes the domain-specific knowledge for neuroimaging power analysis.

When to Use This Skill

  • Planning sample size for a new fMRI, EEG, or MEG study
  • Estimating power for grant applications or registered reports
  • Determining whether a published study was adequately powered
  • Choosing between ROI-based and whole-brain analysis based on power constraints
  • Evaluating the reliability implications of sample size choices

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

  1. State the research question — What specific question is this analysis/paradigm addressing?
  2. Justify the method choice — Why is this approach appropriate? What alternatives were considered?
  3. Declare expected outcomes — What results would support vs. refute the hypothesis?
  4. Note assumptions and limitations — What does this method assume? Where could it mislead?
  5. Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Why Neuroimaging Power Is Different

Standard power analysis assumes a single statistical test. Neuroimaging involves:

ChallengeImpact on PowerSource
Massive multiple comparisons~100,000 voxels tested; correction reduces sensitivity by orders of magnitudeNichols & Hayasaka, 2003
Spatial smoothnessAdjacent voxels are correlated, reducing effective number of independent tests but complicating power calculationWorsley et al., 1996
Multi-level inferenceSubject-level estimation + group-level test; both levels contribute noiseMumford & Nichols, 2008
Effect size variabilityEffects vary across voxels, regions, and subjects; no single "effect size" characterizes a studyPoldrack et al., 2017
Threshold-dependent powerPower depends heavily on the statistical threshold (corrected vs. uncorrected) and correction methodHayasaka et al., 2007

Key implication: A standard G*Power calculation for a two-sample t-test will dramatically overestimate the power of a whole-brain fMRI analysis because it ignores multiple comparison correction (Mumford & Nichols, 2008).

Typical Effect Sizes in Neuroimaging

fMRI Effect Sizes

Analysis TypeTypical Effect SizeUnitSource
Task activation (voxel-level)Cohen's d = 0.5-1.0Standardized mean differencePoldrack et al., 2017
Task activation (ROI-level)Cohen's d = 0.5-1.5Standardized mean differencePoldrack et al., 2017
Between-group difference (voxel)Cohen's d = 0.3-0.8Standardized mean differencePoldrack et al., 2017
Functional connectivity (correlation)r = 0.2-0.5Pearson correlationMarek et al., 2022
Brain-behavior associationr = 0.1-0.3Pearson correlationMarek et al., 2022
Brain-wide association (replicable)r < 0.05 at N < 1000Pearson correlationMarek et al., 2022

Critical finding: Marek et al. (2022) demonstrated that brain-behavior correlations in typical neuroimaging samples (N < 100) are severely inflated. Replicable brain-behavior associations require N > 2,000 for whole-brain analyses.

EEG/ERP Effect Sizes

Analysis TypeTypical Effect SizeSource
ERP component amplitude (e.g., N400, P300)Cohen's d = 0.3-0.8Boudewyn et al., 2018
ERP latency differencesCohen's d = 0.2-0.5Luck, 2014
EEG oscillatory powerCohen's d = 0.3-0.6Cohen, 2014
EEG connectivity (coherence/PLV)Cohen's d = 0.2-0.5Cohen, 2014

Sample Size Benchmarks

fMRI Sample Size Recommendations

DesignMinimum NRecommended NAssumptionsSource
Within-subject task activation2025-30Large effect (d > 0.8), lenient correctionDesmond & Glover, 2002
Between-group comparison (large effect, d = 0.8)20 per group25-30 per groupWhole-brain, cluster-correctedThirion et al., 2007
Between-group comparison (medium effect, d = 0.5)40 per group50+ per groupWhole-brain, cluster-correctedThirion et al., 2007; Poldrack et al., 2017
Resting-state individual differences25+50+ (much more for replicability)Depends on reliability of measureMarek et al., 2022
Brain-behavior correlations100+N > 2,000 for replicable whole-brainLarge-scale onlyMarek et al., 2022
ROI-based analysis (a priori)15-2025+Single ROI, no whole-brain correctionDesmond & Glover, 2002

EEG/ERP Sample Size Recommendations

DesignMinimum per ConditionRecommended per ConditionSource
ERP trials per condition per subject3040-60Boudewyn et al., 2018
ERP between-group (medium d = 0.5)34 per group50+ per groupBoudewyn et al., 2018
ERP within-subject (medium d = 0.5)25 subjects30+ subjectsLuck, 2014
Time-frequency analysis40 trials60+ trialsCohen, 2014

Power at Common Sample Sizes

N (per group)Power for d = 0.5 (uncorrected)Power for d = 0.5 (corrected, whole-brain)Power for d = 0.8 (corrected)
10~26%< 10%~25%
20~50%~20%~50%
30~70%~35%~70%
40~82%~50%~85%
60~94%~70%~95%

Values are approximate, based on simulations from Mumford & Nichols (2008) and Desmond & Glover (2002). Exact power depends on design, smoothness, effect spatial extent, and correction method.

Power Decision Tree

What type of analysis are you planning?
 |
 +-- Whole-brain voxelwise analysis
 | |
 | +-- Within-subject (one-sample t-test)
 | | --> Minimum N = 20; aim for N = 25-30
 | | (Desmond & Glover, 2002)
 | |
 | +-- Between-group comparison
 | | |
 | | +-- Large expected effect (d > 0.8)
 | | | --> N = 20-25 per group (Thirion et al., 2007)
 | | |
 | | +-- Medium expected effect (d = 0.5)
 | | | --> N = 40-50 per group (Poldrack et al., 2017)
 | | |
 | | +-- Small expected effect (d = 0.3)
 | | --> N = 80+ per group; consider ROI approach
 | |
 | +-- Brain-behavior correlation
 | --> N = 100+ minimum; N > 2,000 for replicability
 | (Marek et al., 2022)
 |
 +-- ROI-based analysis (a priori regions)
 | --> Use standard power formulas (G*Power) with expected
 | effect size from literature or pilot data.
 | No multiple comparison correction needed for single ROI.
 | N = 15-30 typical for medium-large effects.
 |
 +-- ERP analysis
 |
 +-- Between-group
 | --> 30-50 per group for medium effects
 | (Boudewyn et al., 2018)
 |
 +-- Within-subject
 --> 25-30 subjects, 30+ trials per condition
 (Boudewyn et al., 2018; Luck, 2014)

Simulation-Based Power Approaches

fMRIpower (Mumford & Nichols, 2008)

Estimates power using pilot group-level activation maps:

  1. Run a pilot study (or use published results) to obtain group-level statistical maps
  2. Estimate effect sizes at each voxel from the pilot data
  3. Simulate new datasets with varying N by resampling from the estimated effect size and variance
  4. Apply the full statistical pipeline (including multiple comparison correction) to each simulation
  5. Power = proportion of simulations that detect the effect at a given ROI or voxel

Requirements: Pilot data from at least 10-15 subjects for stable variance estimates (Mumford & Nichols, 2008)

NeuroPowerTools (Durnez et al., 2016)

Web-based tool for peak-based power estimation:

  1. Upload an unthresholded statistical map from a pilot or published study
  2. The tool fits a mixture model to the peak distribution (null + alternative)
  3. Estimates the proportion of truly active voxels and their average effect size
  4. Computes power for new studies with varying N and thresholds

Advantage: Does not require individual subject data; can use published group maps URL: https://neuropowertools.org

Permutation-Based Power (Hayasaka et al., 2007)

  1. Generate simulated datasets under the alternative hypothesis using effect size maps from pilot data
  2. For each simulated dataset, run a full permutation test (5,000+ permutations)
  3. Compute power as the proportion of simulations in which the permutation test rejects the null

Advantage: Fully nonparametric; accounts for the exact multiple comparison correction used Disadvantage: Computationally expensive (requires running thousands of permutation tests per power estimate)

PowerMap (Joyce & Hayasaka, 2012)

Simulation-based power using parametric assumptions:

  1. Specify effect size map (from pilot data or assumed values)
  2. Specify noise model (based on residuals from pilot data)
  3. Simulate datasets with varying N
  4. Apply parametric statistical testing with specified correction method
  5. Estimate power at each voxel

Multiple Comparison Correction Impact on Power

The choice of correction method dramatically affects required sample size:

Correction MethodEffective Alpha per VoxelRelative PowerSource
None (p < 0.001 uncorrected)0.001Highest (but invalid inference)--
FDR q < 0.05~0.0001-0.001 (data-dependent)Moderate-HighGenovese et al., 2002
Cluster-based (CDT p < 0.001)Depends on cluster sizeModerate-High for large effectsEklund et al., 2016
Voxelwise FWE (RFT, p < 0.05)~0.00000005LowWorsley et al., 1996
TFCE + permutationVariesModerateSmith & Nichols, 2009

Domain insight: Switching from voxelwise FWE to cluster-based or FDR correction can increase power by 50-200% for the same sample size, because these methods exploit the spatial extent of true activations (Nichols & Hayasaka, 2003).

Test-Retest Reliability and Power

For individual differences designs (correlating brain measures with behavior), reliability of the brain measure is critical (Elliott et al., 2020):

MeasureTypical ICCImplicationSource
Task fMRI activation (ROI)0.3-0.6Poor to moderate reliabilityElliott et al., 2020
Resting-state connectivity0.3-0.7Moderate reliability; depends on scan durationElliott et al., 2020
ERP amplitude0.5-0.8Moderate to goodCassidy et al., 2012
EEG oscillatory power0.6-0.9Good to excellentCohen, 2014

Critical formula: The maximum detectable correlation between brain and behavior is bounded by the reliabilities of both measures:

r_observed_max = r_true * sqrt(reliability_brain * reliability_behavior)

With brain ICC = 0.5 and behavior reliability = 0.8, even a true correlation of r = 0.5 would appear as r = 0.5 * sqrt(0.5 * 0.8) = 0.32 on average (Elliott et al., 2020). This attenuation means far larger samples are needed.

Recommendation: For individual differences designs, collect longer scan sessions (at least 20-30 minutes of resting-state data; Birn et al., 2013) or use multi-session data to improve reliability.

Practical Power Calculation Workflow

For a New fMRI Study

  1. Define the primary analysis: Whole-brain voxelwise or ROI-based?
  2. Estimate effect size:
  • From pilot data (preferred): extract effect sizes from pilot activation maps
  • From literature: find the most comparable published study; correct for publication bias by assuming the true effect is ~50-75% of the published estimate (Button et al., 2013)
  • From meta-analysis: use NeuroSynth or BrainMap to estimate typical activation strength
  1. Choose the power analysis tool:
  • ROI-based: Standard power calculation (G*Power) using the estimated effect size at the ROI
  • Whole-brain: fMRIpower, NeuroPowerTools, or simulation
  1. Set target power: 80% (conventional) or 90% (recommended for costly neuroimaging studies)
  2. Account for attrition: Add 10-20% to planned N for participant exclusions due to excessive motion, incomplete data, or technical failures
  3. Report: Effect size source, power tool used, correction method, target power, final N

For a New EEG/ERP Study

  1. Estimate effect size: From pilot data or published ERP studies (see effect size table above)
  2. Determine trial count: At least 30 trials per condition post-rejection (Boudewyn et al., 2018)
  3. Plan for trial attrition: Assume 20-30% trial rejection rate; collect accordingly
  4. Subject-level power: Use G*Power with the estimated within- or between-subject effect size
  5. Account for subject attrition: Add 15-20% for exclusions due to excessive artifacts

Common Pitfalls

  1. Using uncorrected power estimates for whole-brain analyses: A study with 80% power at p < 0.001 uncorrected has far less than 80% power after FWE or FDR correction (Mumford & Nichols, 2008)
  2. Ignoring effect size inflation in pilot studies: Small pilot studies produce inflated effect sizes due to the "winner's curse." Assume the true effect is 50-75% of the pilot estimate (Button et al., 2013)
  3. Applying behavioral power formulas to neuroimaging: Standard t-test power calculations dramatically overestimate power for whole-brain analyses because they ignore multiple comparison correction
  4. Not accounting for participant attrition: In fMRI, 10-20% of participants may be excluded due to motion, scanner artifacts, or incomplete data. Over-recruit accordingly
  5. Ignoring reliability for individual differences: Brain measures with ICC < 0.5 attenuate correlations, requiring much larger samples than traditional power analysis suggests (Elliott et al., 2020)
  6. Assuming published sample sizes are adequate: Most published fMRI studies are underpowered (median power ~20%; Button et al., 2013). Do not use published N as a benchmark
  7. Neglecting the impact of design efficiency: An optimized event-related design can be 2-3x more efficient than a suboptimal one (Dale, 1999), effectively increasing power without adding subjects

Minimum Reporting Checklist

  • Target effect size and its source (pilot data, literature, meta-analysis)
  • Effect size metric used (Cohen's d, r, partial eta-squared)
  • Power analysis method (analytical, simulation-based, tool used)
  • Target power level (typically 80% or 90%)
  • Statistical test assumed (one-sample t, two-sample t, correlation, ANOVA)
  • Multiple comparison correction method and parameters
  • Planned N and justification
  • Attrition allowance (expected exclusion rate)
  • For simulation-based: number of simulations, pilot data source, software
  • For reliability-dependent designs: reliability estimates and their source

References

  • Birn, R. M., Molloy, E. K., Patriat, R., et al. (2013). The effect of scan length on the reliability of resting-state fMRI connectivity estimates. NeuroImage, 83, 550-558.
  • Boudewyn, M. A., Luck, S. J., Farrens, J. L., & Kappenman, E. S. (2018). How many trials does it take to get a significant ERP effect? Psychophysiology, 55(6), e13049.
  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Cassidy, S. M., Robertson, I. H., & O'Connell, R. G. (2012). Retest reliability of event-related potentials: Evidence from a variety of paradigms. Psychophysiology, 49(5), 659-664.
  • Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. MIT Press.
  • Dale, A. M. (1999). Optimal experimental design for event-related fMRI. Human Brain Mapping, 8(2-3), 109-114.
  • Desmond, J. E., & Glover, G. H. (2002). Estimating sample size in functional MRI (fMRI) neuroimaging studies. Journal of Neuroscience Methods, 118(2), 115-128.
  • Durnez, J., Degryse, J., Moerkerke, B., et al. (2016). Power and sample size calculations for fMRI studies based on the prevalence of active peaks. bioRxiv, 049429.
  • Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS, 113(28), 7900-7905.
  • Elliott, M. L., Knodt, A. R., Ireland, D., et al. (2020). What is the test-retest reliability of common task-functional MRI measures? Biological Psychiatry, 87(11), 934-948.
  • Genovese, C. R., Lazar, N. A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage, 15(4), 870-878.
  • Hayasaka, S., Peiffer, A. M., Hugenschmidt, C. E., & Laurienti, P. J. (2007). Power and sample size calculation for neuroimaging studies by non-central random field theory. NeuroImage, 37(3), 721-730.
  • Joyce, K. E., & Hayasaka, S. (2012). Development of PowerMap: A software package for statistical power calculation in neuroimaging studies. Neuroinformatics, 10(4), 351-365.
  • Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.). MIT Press.
  • Marek, S., Tervo-Clemmens, B., Calabro, F. J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654-660.
  • Mumford, J. A., & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. NeuroImage, 39(1), 261-268.
  • Nichols, T. E., & Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: A comparative review. Statistical Methods in Medical Research, 12(5), 419-446.
  • Poldrack, R. A., Baker, C. I., Durnez, J., et al. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115-126.
  • Smith, S. M., & Nichols, T. E. (2009). Threshold-free cluster enhancement. NeuroImage, 44(1), 83-98.
  • Thirion, B., Pinel, P., Meriaux, S., et al. (2007). Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. NeuroImage, 35(1), 105-120.
  • Worsley, K. J., Marrett, S., Neelin, P., et al. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4(1), 58-73.

See references/ for detailed simulation examples and effect size lookup tables.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

eeg preprocessing pipeline guide

No summary provided by upstream source.

Repository SourceNeeds Review
General

self-paced reading designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

visual search array generator

No summary provided by upstream source.

Repository SourceNeeds Review
General

lesion-symptom mapping guide

No summary provided by upstream source.

Repository SourceNeeds Review