Bayesian Cognitive Model Builder

Domain-validated guidance for building hierarchical Bayesian cognitive models with Stan/PyMC: prior specification, model structure, MCMC diagnostics, and posterior predictive checks

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Bayesian Cognitive Model Builder" with this command: npx skills add haoxuanlithuai/awesome_cognitive_and_neuroscience_skills/haoxuanlithuai-awesome-cognitive-and-neuroscience-skills-bayesian-cognitive-model-builder

Bayesian Cognitive Model Builder

Purpose

This skill encodes expert knowledge for building hierarchical Bayesian cognitive models using probabilistic programming languages (Stan, PyMC). It addresses the modeling decisions that require domain expertise beyond knowing Stan/PyMC syntax: how to choose priors that respect cognitive constraints, when to use hierarchical structure, how to diagnose MCMC pathologies, and how to evaluate model adequacy through posterior predictive checks.

A competent programmer without cognitive modeling training would get wrong: which prior families are appropriate for cognitive parameters (e.g., RT must be positive, learning rates bounded in [0,1]), when partial pooling outperforms complete pooling or no pooling, how to detect non-identifiability in cognitive models, and what constitutes adequate MCMC convergence for publishable results.

When to Use This Skill

  • Building a generative model of a cognitive process (decision-making, learning, memory, perception) where parameters have psychological interpretations
  • Estimating individual differences in cognitive parameters while borrowing strength across participants (hierarchical/multilevel models)
  • Working with small samples or sparse data per participant where regularization through priors prevents overfitting
  • Parameter uncertainty matters for your scientific conclusions (credible intervals, not just point estimates)
  • Comparing competing cognitive models via information criteria (LOO-CV, WAIC) or Bayes factors
  • Fitting established cognitive models (DDM, signal detection, reinforcement learning, multinomial processing trees) in a Bayesian framework

When NOT to Use This Skill

  • If your model has a closed-form MLE and you have large, balanced samples, frequentist estimation may be simpler and adequate
  • For purely predictive models where parameter interpretability is irrelevant (consider machine learning approaches)
  • If you need a general-purpose Bayesian regression model without cognitive process parameters (see cogsci-statistics skill)
  • For EEG/fMRI analysis pipelines without explicit cognitive models (see erp-analysis or fmri-glm-analysis-guide skills)

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

  1. State the research question -- What cognitive mechanism is this model capturing?
  2. Justify the method choice -- Why Bayesian (not MLE, not frequentist)? What alternatives were considered?
  3. Declare expected outcomes -- What parameter patterns would support vs. refute the hypothesis?
  4. Note assumptions and limitations -- What does this model assume about the cognitive process?
  5. Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Model Structure Decision Tree

Choosing the right level of pooling is a fundamental modeling decision that a non-specialist routinely gets wrong.

Step 1: Do You Have Grouped Data?

If your data has a natural grouping structure (e.g., multiple trials per participant, participants within conditions), you need to decide on a pooling strategy. If not, fit a single model.

Step 2: Choose the Pooling Level

StrategyStructureWhen AppropriateRisk
Complete poolingOne set of parameters for all participantsLarge homogeneous groups, nuisance individual differencesIgnores meaningful individual variation; biased group estimates if heterogeneity exists (Gelman et al., 2013, Ch. 5)
No poolingSeparate parameters per participantMany trials per participant (>200), individual-level inference is the goalNoisy estimates for participants with few trials; no borrowing of strength (Gelman et al., 2013, Ch. 5)
Partial pooling (hierarchical)Individual parameters drawn from group distributionDefault choice for cognitive modeling; few-to-moderate trials per participant; individual differences are scientifically meaningfulRequires MCMC; potential convergence issues with centered parameterization (Gelman et al., 2013, Ch. 5)

Critical domain knowledge: Hierarchical (partial pooling) models should be the default in cognitive science. They automatically regularize extreme individual estimates toward the group mean -- a property called "shrinkage" -- which is especially valuable with typical cognitive science sample sizes of 20-40 participants with 50-200 trials each (Lee & Wagenmakers, 2014, Ch. 8).

Step 3: Centered vs. Non-Centered Parameterization

For hierarchical models, the parameterization choice affects MCMC efficiency:

  • Centered parameterization: theta_j ~ Normal(mu, sigma). Use when there are many observations per group (>100 trials per participant) and the data are informative relative to the prior (Betancourt & Girolami, 2015).
  • Non-centered parameterization: theta_j = mu + sigma * eta_j where eta_j ~ Normal(0, 1). Use when there are few observations per group, the group-level variance is small, or you encounter divergent transitions with centered parameterization (Betancourt & Girolami, 2015; Stan User's Guide, Section 1.13).

When in doubt, use non-centered parameterization. It is more robust across a wider range of data configurations and is the Stan Development Team's default recommendation.

Prior Selection Principles

General Philosophy

Use weakly informative priors that encode known constraints without dominating the likelihood. The goal is to rule out impossible or implausible parameter values while remaining agnostic about the precise value (Gelman et al., 2008; Gelman et al., 2013, Ch. 2).

Domain-critical principle: Cognitive parameters have natural constraints that generic "flat" or "diffuse" priors violate. Reaction times cannot be negative. Probabilities must lie in [0,1]. Learning rates are bounded. Firing rates are non-negative. Encoding these constraints in the prior is not "being subjective" -- it is encoding physical and psychological reality (Lee & Wagenmakers, 2014, Ch. 4).

Prior Families for Common Cognitive Parameter Types

Parameter TypeRecommended PriorRationaleSource
Location (unbounded)Normal(0, sd) or Student-t(3, 0, sd)Weakly informative; heavier tails with Student-t for robustnessGelman et al., 2008
Scale / varianceHalf-Normal(0, sd) or Half-Cauchy(0, sd)Positive-only; Half-Cauchy allows heavier tails for group-level SDsGelman, 2006; Polson & Scott, 2012
Probability (0 to 1)Beta(a, b)Natural conjugate for binomial; Beta(1,1) = Uniform; Beta(2,2) = weakly informative centered at 0.5Kruschke, 2015, Ch. 6
Rate (0 to 1)Beta(1.1, 1.1) or logit-NormalGently regularizes away from boundariesGelman et al., 2013, Ch. 2
Positive continuousGamma(shape, rate) or Lognormal(mu, sigma)For RT, non-decision time, threshold parametersLee & Wagenmakers, 2014, Ch. 4
Correlation matrixLKJ(eta)eta=1: uniform over matrices; eta=2: weakly informative (Stan default recommendation)Lewandowski et al., 2009; Stan User's Guide
Simplex (sums to 1)Dirichlet(alpha)alpha=1: uniform on simplex; alpha>1: concentrates toward centerGelman et al., 2013, Ch. 2

For detailed cognitive-domain-specific prior tables, see references/prior-selection-guide.md.

Prior Predictive Checking

Always run a prior predictive check before fitting to data (Schad et al., 2021; Gabry et al., 2019):

  1. Sample parameters from your priors (no data)
  2. Simulate data from the model using those parameters
  3. Check: Does the simulated data look plausible for the domain?
  • If the prior predicts impossible RTs (e.g., negative, or > 60 seconds), the prior is too diffuse
  • If the prior predicts accuracy always near 50% or always near 100%, reconsider
  1. Iterate on priors until prior predictive distributions cover plausible data ranges without including absurd values

Common Cognitive Models in Bayesian Framework

Drift-Diffusion Model (DDM / HDDM)

  • Key parameters: drift rate (v), boundary separation (a), non-decision time (t), starting point bias (z)
  • Bayesian implementation: HDDM package (Wiecki et al., 2013) uses informative priors from empirical meta-analysis (Matzke & Wagenmakers, 2009)
  • Typical priors: See references/prior-selection-guide.md for parameter-specific recommendations
  • Critical note: Within-trial noise (s) is a scaling parameter fixed by convention at 0.1 (Ratcliff, 1978) or 1.0 (Navarro & Fuss, 2009). All other parameter ranges depend on this choice.
  • Also see the drift-diffusion-model skill for detailed DDM guidance

Signal Detection Theory (SDT)

  • Key parameters: sensitivity (d'), criterion (c)
  • Priors for d': Normal(0, 2) is weakly informative; typical empirical values range 0 to 4 (Macmillan & Creelman, 2005)
  • Priors for c: Normal(0, 1.5) centered at no bias; typical range -2 to 2 (Macmillan & Creelman, 2005)
  • Hierarchical structure: Individual d' and c drawn from group distributions (Rouder & Lu, 2005)
  • Also see the signal-detection-analysis skill

Multinomial Processing Trees (MPT)

  • Key parameters: Processing probabilities (all in [0,1])
  • Priors: Beta(1,1) for non-informative or Beta(a,b) with shape informed by prior studies (Klauer, 2010)
  • Hierarchical extension: Latent-trait MPT with probit-transformed parameters drawn from multivariate normal (Klauer, 2010)

Item Response Theory (IRT)

  • Key parameters: Ability (theta), difficulty (b), discrimination (a)
  • Priors: theta ~ Normal(0,1) by convention; b ~ Normal(0, 2); a ~ Lognormal(0, 0.5) to enforce positivity (de Boeck & Wilson, 2004)
  • Cognitive application: Modeling learning, cognitive ability, or item difficulty in memory/attention tasks

Reinforcement Learning (RL)

  • Key parameters: Learning rate (alpha in [0,1]), inverse temperature (beta > 0), decay, perseveration
  • Priors for alpha: Beta(1.1, 1.1) weakly informative on [0,1] (Daw, 2011; Gershman, 2016)
  • Priors for beta (inverse temperature): Gamma(2, 1) or Lognormal(0, 1) constraining to positive values; typical range 0.5 to 20 (Daw, 2011)
  • Critical note: Learning rate and inverse temperature are often poorly identifiable in standard Q-learning; consider reparameterization or strong priors (Daw, 2011; Wilson & Collins, 2019)

MCMC Diagnostics

Every Bayesian analysis requires thorough convergence diagnostics. Never report posterior summaries without first verifying convergence. See references/diagnostics-checklist.md for the full step-by-step protocol.

Minimum Convergence Criteria

DiagnosticThresholdInterpretationSource
R-hat (split R-hat)< 1.01Between-chain vs. within-chain variance; values > 1.01 indicate non-convergenceVehtari et al., 2021
Bulk-ESS> 400 (100 per chain with 4 chains)Effective independent draws for posterior mean/median estimationVehtari et al., 2021
Tail-ESS> 400Effective draws for tail quantiles (credible intervals)Vehtari et al., 2021
Divergent transitions0Any divergences indicate the sampler failed to explore the posterior faithfullyBetancourt, 2017
E-BFMI> 0.3Energy Bayesian Fraction of Missing Information; low values indicate poor explorationBetancourt, 2017
Tree depth saturationRare (<1% of transitions)Hitting maximum tree depth suggests difficult geometryStan User's Guide

Critical domain knowledge: The older threshold of R-hat < 1.1 is outdated. Vehtari et al. (2021) demonstrated that the traditional R-hat can miss convergence failures. Use the rank-normalized split R-hat with a threshold of 1.01 and always report both bulk-ESS and tail-ESS.

When Diagnostics Fail

See references/diagnostics-checklist.md for remediation steps. The most common fixes in cognitive modeling:

  1. Divergent transitions --> Switch to non-centered parameterization; increase adapt_delta to 0.95-0.99
  2. Low ESS --> Run longer chains; check for multimodality; reparameterize
  3. High R-hat --> Run more iterations; check for label switching in mixture models
  4. E-BFMI warning --> Reparameterize; consider reducing model complexity

Model Comparison

Information Criteria (Preferred for Most Applications)

MethodWhen to UseImplementationSource
PSIS-LOO-CVDefault choice for comparing predictive accuracy; more robust than WAIC with weak priors or influential observationsloo package (R), az.loo (Python/ArviZ)Vehtari et al., 2017
WAICAsymptotically equivalent to LOO; acceptable when PSIS diagnostics are clean (all Pareto k < 0.7)loo package (R), az.waic (Python/ArviZ)Watanabe, 2010; Vehtari et al., 2017
Bayes factorsWhen testing a precise null hypothesis (e.g., parameter = 0); sensitive to prior specificationBridge sampling, Savage-Dickey density ratioKass & Raftery, 1995; Lee & Wagenmakers, 2014, Ch. 7

Critical domain knowledge: Prefer LOO-CV over WAIC for cognitive models. Vehtari et al. (2017) showed that PSIS-LOO is more robust in the finite-sample case, especially with weak priors or influential observations common in cognitive data. Always check the Pareto k diagnostic: values > 0.7 indicate unreliable LOO estimates for those observations.

Interpreting Bayes Factors

Bayes Factor (BF10)Evidence CategorySource
1 - 3Anecdotal / not worth more than a bare mentionJeffreys, 1961; Lee & Wagenmakers, 2014
3 - 10Moderate evidenceJeffreys, 1961; Lee & Wagenmakers, 2014
10 - 30Strong evidenceJeffreys, 1961; Lee & Wagenmakers, 2014
30 - 100Very strong evidenceJeffreys, 1961; Lee & Wagenmakers, 2014
> 100Extreme / decisive evidenceJeffreys, 1961; Lee & Wagenmakers, 2014

Caution: Bayes factors are highly sensitive to prior specification. A diffuse prior on the alternative hypothesis inflates evidence for the null (the Jeffreys-Lindley paradox). Always conduct a prior sensitivity analysis when reporting Bayes factors (Schad et al., 2021).

Posterior Predictive Checks

After model comparison, the selected model must demonstrate it can reproduce key features of the observed data:

  1. Simulate data from the posterior predictive distribution (draw parameters from posterior, then generate synthetic data)
  2. Compare summary statistics: mean RT, RT quantiles (0.1, 0.3, 0.5, 0.7, 0.9), accuracy, conditional accuracy functions
  3. Visual checks: overlay posterior predictive density on observed data; Q-Q plots; residual distributions
  4. Quantitative checks: Posterior predictive p-values for test statistics of interest; values near 0 or 1 indicate misfit (Gelman et al., 2013, Ch. 6)

Domain-specific checks: For RT models, always check the fit to the full RT distribution (not just the mean). Cognitive models derive their power from fitting distributional shape -- a model that matches mean RT but misses the right tail is inadequate (Ratcliff & McKoon, 2008).

Common Pitfalls

1. Non-Identifiability

Problem: Two or more parameters trade off so that many parameter combinations yield equivalent likelihoods. Common in RL models (learning rate vs. inverse temperature) and DDM (boundary vs. drift rate with few conditions).

Detection: Pairwise posterior scatter plots show strong correlations or ridges; marginal posteriors are much wider than expected.

Fix: Add conditions that differentially constrain parameters; use informative priors; reparameterize (e.g., the ratio v/a in DDM; Wilson & Collins, 2019).

2. Prior Sensitivity

Problem: Posterior conclusions change substantially when priors are varied within a reasonable range.

Detection: Re-fit with 2-3 alternative prior specifications and compare posteriors (Schad et al., 2021).

Fix: Collect more data; use more informative priors justified by previous literature; report sensitivity analysis in the paper.

3. Label Switching

Problem: In mixture models, MCMC chains swap component labels, creating multimodal marginal posteriors even when the model is well-identified.

Detection: Trace plots show "switching" between modes; R-hat is high even with long chains.

Fix: Impose ordering constraints (e.g., mu_1 < mu_2); use label-invariant summaries; post-hoc relabeling (Stephens, 2000).

4. Improper Posterior Geometry

Problem: Funnel-shaped posterior in hierarchical models where the group SD approaches zero, creating an increasingly narrow funnel that the sampler cannot traverse.

Detection: Divergent transitions concentrated near low group-SD values; non-centered parameterization is the standard fix (Betancourt & Girolami, 2015).

Fix: Non-centered parameterization (see Step 3 in Model Structure Decision Tree above).

5. Insufficient Trials Per Participant

Problem: With too few trials, individual-level parameters are poorly constrained even in hierarchical models.

Guideline: For DDM, minimum 40-60 trials per condition per participant for stable hierarchical estimation (Wiecki et al., 2013; Ratcliff & Childers, 2015). For simpler models (e.g., binomial SDT), 20-30 trials may suffice with hierarchical priors (Lee & Wagenmakers, 2014).

Fix: If data are already collected, rely more heavily on hierarchical shrinkage and report wide credible intervals honestly.

Reporting Checklist

When reporting Bayesian cognitive models in a manuscript:

  1. Model specification: Full generative model with likelihood and all priors (consider a graphical model plate diagram)
  2. Prior justification: Why each prior was chosen; cite sources for domain-informed priors
  3. Prior predictive check: Confirm priors generate plausible data
  4. Software and sampler settings: Package, version, number of chains (minimum 4; Vehtari et al., 2021), warmup iterations, sampling iterations, adapt_delta
  5. Convergence diagnostics: R-hat (< 1.01), bulk-ESS (> 400), tail-ESS (> 400), divergent transitions (0)
  6. Posterior summaries: Means/medians, credible intervals (89% or 95% HDI; Kruschke, 2015, Ch. 12), and full posterior distributions where space allows
  7. Posterior predictive checks: Visual and/or quantitative evidence that the model reproduces key data features
  8. Model comparison (if applicable): LOO-CV/WAIC with standard errors; Bayes factors with prior sensitivity
  9. Sensitivity analysis: At least one alternative prior specification with comparison of results
  10. Code and data availability: Share Stan/PyMC code and (where possible) data for reproducibility

Key References

  • Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434.
  • Betancourt, M., & Girolami, M. (2015). Hamiltonian Monte Carlo for hierarchical models. In Current Trends in Bayesian Methodology with Applications. Chapman and Hall/CRC.
  • Daw, N. D. (2011). Trial-by-trial data analysis using computational models. In Decision Making, Affect, and Learning. Oxford University Press.
  • Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A, 182(2), 389-402.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515-534.
  • Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2(4), 1360-1383.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC.
  • Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1-6.
  • Kruschke, J. K. (2015). Doing Bayesian Data Analysis (2nd ed.). Academic Press.
  • Lee, M. D., & Wagenmakers, E. J. (2014). Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.
  • Schad, D. J., Betancourt, M., & Vasishth, S. (2021). Toward a principled Bayesian workflow in cognitive science. Psychological Methods, 26(1), 103-126.
  • Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432.
  • Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Burkner, P. C. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667-718.
  • Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7, 14.
  • Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

eeg preprocessing pipeline guide

No summary provided by upstream source.

Repository SourceNeeds Review
General

self-paced reading designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

lesion-symptom mapping guide

No summary provided by upstream source.

Repository SourceNeeds Review
General

verify skill

No summary provided by upstream source.

Repository SourceNeeds Review