Stan Development

Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy).

Modern Stan Syntax

Array Declarations (IMPORTANT)

Use new array syntax introduced in Stan 2.26+:

// Good - Modern syntax (Stan 2.26+) array[10] int x; // Array of 10 integers array[N] real y; // Array of N reals array[N, M] real z; // 2D array array[N] vector[K] v; // Array of vectors

// Avoid - Old syntax (deprecated) int x[10]; // Old style real y[N]; // Old style real z[N, M]; // Old style vector[K] v[N]; // Old style

Why the change?

More consistent and readable
Clearer distinction between arrays and matrix/vector types
Aligns with modern Stan style

Other Key Syntax

Data types:

// Scalar types int n; real x;

// Vector/Matrix types vector[N] v; // Column vector row_vector[N] rv; // Row vector matrix[N, M] A; // Matrix

// Arrays of vectors/matrices array[K] vector[N] vectors; array[K] matrix[N, M] matrices;

// Constrained types real<lower=0> sigma; // Lower bound real<lower=0, upper=1> theta; // Bounded simplex[K] pi; // Simplex constraint ordered[N] sorted_values; // Ordered constraint

Stan File Organization

Standard Structure

project/ ├── inst/stan/ # Stan models (R packages) │ ├── model.stan # Main model file │ ├── functions/ # Reusable Stan functions │ │ ├── likelihood.stan │ │ ├── priors.stan │ │ └── utils.stan │ └── chunks/ # Reusable code chunks (optional) └── stan/ # Alternative location (Python projects)

Modular Stan Code

Main model includes functions:

functions { #include functions/likelihood.stan #include functions/priors.stan #include functions/utils.stan }

data { int<lower=0> N; array[N] real y; }

parameters { real mu; real<lower=0> sigma; }

model { // Use functions from included files y ~ custom_likelihood(mu, sigma); mu ~ custom_prior(); }

Benefits of modular approach:

Reusable functions across models
Easier testing of individual components
Better organization for complex models
Cleaner main model files

Stan Integration with R (cmdstanr)

Basic Workflow

library(cmdstanr)

Compile Stan model

model <- cmdstan_model("path/to/model.stan")

Prepare data (list with names matching Stan data block)

stan_data <- list( N = 100, y = rnorm(100) )

Fit model

fit <- model$sample( data = stan_data, chains = 4, parallel_chains = 4, iter_warmup = 1000, iter_sampling = 1000 )

Extract samples

draws <- fit$draws() summary <- fit$summary()

Model Compilation and Caching

Get model path from package

model_path <- system.file("stan", "model.stan", package = "mypackage")

Compile with cmdstanr

model <- cmdstan_model(model_path)

Package-specific model functions

Many packages provide wrapper functions:

my_package_model <- get_model() # Returns compiled model

Inference Algorithms

NUTS sampling (default, most robust)

fit <- model$sample(data = stan_data)

Variational inference (faster, approximate)

fit <- model$variational(data = stan_data)

Pathfinder (fast, approximate)

fit <- model$pathfinder(data = stan_data)

Laplace approximation (very fast, approximate)

fit <- model$laplace(data = stan_data)

Optimization (MAP estimate)

fit <- model$optimize(data = stan_data)

Testing Stan Functions

Exposing Stan Functions to R

In R packages with cmdstanr:

Expose Stan functions for testing

Typically in a package function

expose_stan_functions <- function() { model_path <- system.file("stan", "functions.stan", package = "mypackage")

Note: This typically works on Linux only

cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE) }

Then in tests

test_that("Stan function works correctly", {

Exposed functions are now available in R

result <- stan_function_name(args) expect_equal(result, expected) })

Common pattern in test setup:

tests/testthat/setup.R

if (on_linux()) { expose_stan_functions() }

tests/testthat/test-stan-functions.R

test_that("likelihood function works", { skip_if_not(on_linux()) # Stan function exposure often Linux-only

result <- stan_likelihood(data, params) expect_gt(result, -Inf) })

Stan-R Interface Patterns

Data Preparation

Convert R data structures to Stan format

stan_data_list <- list( N = nrow(data), K = ncol(X), y = data$outcome, X = as.matrix(X),

Arrays use R vectors/matrices directly

group = as.integer(data$group) )

For complex models, packages often provide conversion functions

stan_data <- prepare_stan_data(preprocessed_data, model_spec)

Prior Specification as Data

Empirical Bayes approach - priors as data:

data { // Data int<lower=0> N; array[N] real y;

// Priors as data (empirical Bayes) real prior_mu_mean; real<lower=0> prior_mu_sd; real<lower=0> prior_sigma_alpha; real<lower=0> prior_sigma_beta; }

parameters { real mu; real<lower=0> sigma; }

model { // Use data-specified priors mu ~ normal(prior_mu_mean, prior_mu_sd); sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta); y ~ normal(mu, sigma); }

Benefits:

Priors can be updated without recompiling
Easy to specify different priors for different models
Enables automated prior selection

Distribution Lookup Systems

For packages with multiple distribution options:

R side - get distribution ID

dist_id <- get_distribution_id("lognormal")

Stan side - use distribution

stan_data <- list( distribution = dist_id, # Integer ID

... other data

)

// Stan functions for distribution dispatch real compute_lpdf(real y, int distribution, vector params) { if (distribution == 1) { // Normal return normal_lpdf(y | params[1], params[2]); } else if (distribution == 2) { // Lognormal return lognormal_lpdf(y | params[1], params[2]); } // ... other distributions }

Common Stan Patterns

Vectorization

// Good - vectorized (much faster) y ~ normal(mu, sigma);

// Avoid - loop (slower) for (n in 1:N) { y[n] ~ normal(mu, sigma); }

Efficient Matrix Operations

// Use built-in matrix operations vector[N] mu = X * beta; // Matrix-vector multiplication

// Avoid loops when vectorization possible

Handling Missing Data

data { int<lower=0> N; int<lower=0> N_obs; array[N_obs] int<lower=1, upper=N> obs_idx; array[N_obs] real y_obs; }

parameters { array[N] real y_latent; real mu; real<lower=0> sigma; }

model { // Likelihood for observed data y_obs ~ normal(mu, sigma);

// Prior/model for all data y_latent ~ normal(mu, sigma);

// Constrain observed values y_latent[obs_idx] ~ normal(y_obs, 0.001); // Small noise }

Debugging Stan Models

Common Issues

Divergences:

Increase adapt_delta

fit <- model$sample( data = stan_data, adapt_delta = 0.95 # Default 0.8 )

Slow mixing:

Use non-centered parameterization

Reparameterize hierarchical models

Model diagnostics:

Check convergence

fit$diagnostic_summary()

Check R-hat, ESS

fit$summary(c("Rhat", "ess_bulk", "ess_tail"))

Pairs plot for problematic parameters

bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma"))

When to Use This Skill

Activate this skill when:

Writing Stan models
Integrating Stan with R packages (cmdstanr)
Testing Stan functions
Debugging Stan models
Working with Bayesian hierarchical models
Implementing custom likelihoods or priors in Stan

This skill provides Stan-specific development patterns. Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.

stan-development

Safety Notice

Copy this and send it to your AI assistant to learn