Stan Development
Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy).
Modern Stan Syntax
Array Declarations (IMPORTANT)
Use new array syntax introduced in Stan 2.26+:
// Good - Modern syntax (Stan 2.26+) array[10] int x; // Array of 10 integers array[N] real y; // Array of N reals array[N, M] real z; // 2D array array[N] vector[K] v; // Array of vectors
// Avoid - Old syntax (deprecated) int x[10]; // Old style real y[N]; // Old style real z[N, M]; // Old style vector[K] v[N]; // Old style
Why the change?
-
More consistent and readable
-
Clearer distinction between arrays and matrix/vector types
-
Aligns with modern Stan style
Other Key Syntax
Data types:
// Scalar types int n; real x;
// Vector/Matrix types vector[N] v; // Column vector row_vector[N] rv; // Row vector matrix[N, M] A; // Matrix
// Arrays of vectors/matrices array[K] vector[N] vectors; array[K] matrix[N, M] matrices;
// Constrained types real<lower=0> sigma; // Lower bound real<lower=0, upper=1> theta; // Bounded simplex[K] pi; // Simplex constraint ordered[N] sorted_values; // Ordered constraint
Stan File Organization
Standard Structure
project/ ├── inst/stan/ # Stan models (R packages) │ ├── model.stan # Main model file │ ├── functions/ # Reusable Stan functions │ │ ├── likelihood.stan │ │ ├── priors.stan │ │ └── utils.stan │ └── chunks/ # Reusable code chunks (optional) └── stan/ # Alternative location (Python projects)
Modular Stan Code
Main model includes functions:
functions { #include functions/likelihood.stan #include functions/priors.stan #include functions/utils.stan }
data { int<lower=0> N; array[N] real y; }
parameters { real mu; real<lower=0> sigma; }
model { // Use functions from included files y ~ custom_likelihood(mu, sigma); mu ~ custom_prior(); }
Benefits of modular approach:
-
Reusable functions across models
-
Easier testing of individual components
-
Better organization for complex models
-
Cleaner main model files
Stan Integration with R (cmdstanr)
Basic Workflow
library(cmdstanr)
Compile Stan model
model <- cmdstan_model("path/to/model.stan")
Prepare data (list with names matching Stan data block)
stan_data <- list( N = 100, y = rnorm(100) )
Fit model
fit <- model$sample( data = stan_data, chains = 4, parallel_chains = 4, iter_warmup = 1000, iter_sampling = 1000 )
Extract samples
draws <- fit$draws() summary <- fit$summary()
Model Compilation and Caching
Get model path from package
model_path <- system.file("stan", "model.stan", package = "mypackage")
Compile with cmdstanr
model <- cmdstan_model(model_path)
Package-specific model functions
Many packages provide wrapper functions:
my_package_model <- get_model() # Returns compiled model
Inference Algorithms
NUTS sampling (default, most robust)
fit <- model$sample(data = stan_data)
Variational inference (faster, approximate)
fit <- model$variational(data = stan_data)
Pathfinder (fast, approximate)
fit <- model$pathfinder(data = stan_data)
Laplace approximation (very fast, approximate)
fit <- model$laplace(data = stan_data)
Optimization (MAP estimate)
fit <- model$optimize(data = stan_data)
Testing Stan Functions
Exposing Stan Functions to R
In R packages with cmdstanr:
Expose Stan functions for testing
Typically in a package function
expose_stan_functions <- function() { model_path <- system.file("stan", "functions.stan", package = "mypackage")
Note: This typically works on Linux only
cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE) }
Then in tests
test_that("Stan function works correctly", {
Exposed functions are now available in R
result <- stan_function_name(args) expect_equal(result, expected) })
Common pattern in test setup:
tests/testthat/setup.R
if (on_linux()) { expose_stan_functions() }
tests/testthat/test-stan-functions.R
test_that("likelihood function works", { skip_if_not(on_linux()) # Stan function exposure often Linux-only
result <- stan_likelihood(data, params) expect_gt(result, -Inf) })
Stan-R Interface Patterns
Data Preparation
Convert R data structures to Stan format
stan_data_list <- list( N = nrow(data), K = ncol(X), y = data$outcome, X = as.matrix(X),
Arrays use R vectors/matrices directly
group = as.integer(data$group) )
For complex models, packages often provide conversion functions
stan_data <- prepare_stan_data(preprocessed_data, model_spec)
Prior Specification as Data
Empirical Bayes approach - priors as data:
data { // Data int<lower=0> N; array[N] real y;
// Priors as data (empirical Bayes) real prior_mu_mean; real<lower=0> prior_mu_sd; real<lower=0> prior_sigma_alpha; real<lower=0> prior_sigma_beta; }
parameters { real mu; real<lower=0> sigma; }
model { // Use data-specified priors mu ~ normal(prior_mu_mean, prior_mu_sd); sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta); y ~ normal(mu, sigma); }
Benefits:
-
Priors can be updated without recompiling
-
Easy to specify different priors for different models
-
Enables automated prior selection
Distribution Lookup Systems
For packages with multiple distribution options:
R side - get distribution ID
dist_id <- get_distribution_id("lognormal")
Stan side - use distribution
stan_data <- list( distribution = dist_id, # Integer ID
... other data
)
// Stan functions for distribution dispatch real compute_lpdf(real y, int distribution, vector params) { if (distribution == 1) { // Normal return normal_lpdf(y | params[1], params[2]); } else if (distribution == 2) { // Lognormal return lognormal_lpdf(y | params[1], params[2]); } // ... other distributions }
Common Stan Patterns
Vectorization
// Good - vectorized (much faster) y ~ normal(mu, sigma);
// Avoid - loop (slower) for (n in 1:N) { y[n] ~ normal(mu, sigma); }
Efficient Matrix Operations
// Use built-in matrix operations vector[N] mu = X * beta; // Matrix-vector multiplication
// Avoid loops when vectorization possible
Handling Missing Data
data { int<lower=0> N; int<lower=0> N_obs; array[N_obs] int<lower=1, upper=N> obs_idx; array[N_obs] real y_obs; }
parameters { array[N] real y_latent; real mu; real<lower=0> sigma; }
model { // Likelihood for observed data y_obs ~ normal(mu, sigma);
// Prior/model for all data y_latent ~ normal(mu, sigma);
// Constrain observed values y_latent[obs_idx] ~ normal(y_obs, 0.001); // Small noise }
Debugging Stan Models
Common Issues
Divergences:
Increase adapt_delta
fit <- model$sample( data = stan_data, adapt_delta = 0.95 # Default 0.8 )
Slow mixing:
Use non-centered parameterization
Reparameterize hierarchical models
Model diagnostics:
Check convergence
fit$diagnostic_summary()
Check R-hat, ESS
fit$summary(c("Rhat", "ess_bulk", "ess_tail"))
Pairs plot for problematic parameters
bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma"))
When to Use This Skill
Activate this skill when:
-
Writing Stan models
-
Integrating Stan with R packages (cmdstanr)
-
Testing Stan functions
-
Debugging Stan models
-
Working with Bayesian hierarchical models
-
Implementing custom likelihoods or priors in Stan
This skill provides Stan-specific development patterns. Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.