RStan to PyStan Conversion

This skill provides guidance for converting RStan (R interface to Stan) code to PyStan (Python interface to Stan), focusing on the significant API differences between the two libraries.

Key Insight: Stan Model Code is Language-Agnostic

The Stan modeling language itself is identical between RStan and PyStan. The Stan model code (data blocks, parameters, model, generated quantities) can typically be copied directly. The conversion challenge lies in the wrapper code that:

Prepares data for the model
Calls the sampler with correct parameters
Extracts and processes posterior samples

Pre-Conversion Checklist

Before writing any conversion code:

Verify system dependencies for PyStan

PyStan 3.x requires a C++ compiler (g++ on Linux, clang on macOS)
Install with: apt-get install g++ or equivalent
Missing compiler causes cryptic compilation errors at runtime

Identify PyStan version to target

PyStan 2.x API mirrors RStan closely
PyStan 3.x has a completely different API (recommended for new projects)
This guide focuses on PyStan 3.x conversion

Read the complete R script before converting

Identify all hyperparameters used
Note any data transformations
Understand the expected output format

Hyperparameter Mapping (RStan to PyStan 3.x)

Create an explicit mapping table before coding:

RStan Parameter PyStan 3.x Equivalent Notes

iter

N/A (see calculation) Total iterations including warmup

warmup

num_warmup

Number of warmup iterations

chains

num_chains

Number of MCMC chains

thin

N/A PyStan 3 does not support thinning directly

seed

random_seed

Random seed for reproducibility

control=list(adapt_delta=X)

N/A Not directly available in PyStan 3

control=list(max_treedepth=X)

N/A Not directly available in PyStan 3

Critical calculation for num_samples:

PyStan num_samples = (RStan iter - RStan warmup) / RStan thin

Example: If RStan uses iter=2000, warmup=1000, thin=2 :

Effective samples = (2000 - 1000) / 2 = 500
Use num_samples=500 in PyStan 3

Sample Extraction: Critical API Difference

This is the most common source of errors in conversion.

RStan sample extraction:

Returns matrix with shape (n_iterations, n_parameters)

samples <- extract(fit)$parameter_name

For vector parameters, shape is (n_iterations, param_length)

PyStan 3.x sample extraction:

Returns array with shape (param_length, n_samples) - TRANSPOSED!

samples = fit['parameter_name']

Single parameters have shape (n_samples,)

Verification approach:

Always add shape debugging in first version

for param in ['rho', 'beta', 'sigma']: print(f"{param} shape: {fit[param].shape}")

Step-by-Step Conversion Process

Step 1: Analyze the RStan Script

Extract the Stan model code (between stan() or in separate .stan file)
Document all hyperparameters with their values
Identify data preparation steps
Note the expected output format and calculations

Step 2: Set Up Python Environment

Verify compiler availability first

import subprocess result = subprocess.run(['g++', '--version'], capture_output=True) if result.returncode != 0: raise RuntimeError("g++ not found - install C++ compiler first")

Then install and import

import stan # PyStan 3.x

Step 3: Translate the Stan Model

Copy Stan model code directly (it's language-agnostic)
Store as a Python string with proper escaping
Verify all data block variables match your Python data preparation

Step 4: Prepare Data Dictionary

RStan uses list(), PyStan uses dict

Ensure all variable names match the Stan data block exactly

data = { 'N': len(y), 'K': X.shape[1], 'y': y.tolist(), # Convert numpy arrays to lists 'X': X.tolist() }

Step 5: Build and Sample

PyStan 3.x pattern

posterior = stan.build(stan_code, data=data) fit = posterior.sample( num_chains=chains, num_samples=num_samples, # Calculated from RStan params num_warmup=warmup, random_seed=seed )

Step 6: Extract Results with Shape Verification

ALWAYS verify shapes before computing statistics

samples = fit['parameter_name'] print(f"Shape: {samples.shape}") # Debug first

For posterior means:

- If shape is (n_samples,): use samples.mean()

- If shape is (param_dim, n_samples): use samples.mean(axis=1)

Common Pitfalls and Solutions

Pitfall 1: Assuming RStan-like sample shapes

Symptom: Getting 1000 values when expecting 3 (or vice versa)
Cause: PyStan 3 returns (param_dim, n_samples) not (n_samples, param_dim)
Solution: Always check .shape before computing statistics

Pitfall 2: Missing C++ compiler

Symptom: Compilation errors when building Stan model
Cause: PyStan compiles Stan to C++ at runtime
Solution: Install g++ before attempting to run PyStan code

Pitfall 3: Incorrect num_samples calculation

Symptom: Different number of posterior samples than expected
Cause: Not accounting for thinning in calculation
Solution: Use formula: num_samples = (iter - warmup) / thin

Pitfall 4: Data type mismatches

Symptom: Stan compilation or runtime errors about data types
Cause: NumPy arrays not converted to Python lists
Solution: Use .tolist() on numpy arrays in data dictionary

Pitfall 5: Ignoring sampling warnings

Symptom: Warnings about Cholesky decomposition, divergences, etc.
Cause: Can indicate model issues or poor sampling
Solution: While often transient, log warnings and verify results make sense

Verification Strategy

After conversion, verify the following:

Sample counts match expectations

expected_samples = (rstan_iter - rstan_warmup) // rstan_thin assert fit['param'].shape[-1] == expected_samples

Parameter dimensions are correct

For a 3-dimensional parameter vector

assert fit['beta'].shape[0] == 3

Posterior statistics are reasonable

Compare means, check they're in expected ranges

posterior_mean = fit['param'].mean(axis=-1) print(f"Posterior mean: {posterior_mean}")

Output format matches requirements

Verify JSON/CSV structure
Check key names and data types
Ensure numerical precision is adequate

Minimal Test Pattern

Before full conversion, write a minimal test to understand PyStan's behavior:

import stan

Simple model to verify API understanding

code = """ data { int N; array[N] real y; } parameters { real mu; real<lower=0> sigma; } model { y ~ normal(mu, sigma); } """

data = {'N': 10, 'y': [1.0]*10} posterior = stan.build(code, data=data) fit = posterior.sample(num_chains=1, num_samples=100)

Verify you understand the output structure

print(f"mu shape: {fit['mu'].shape}") print(f"sigma shape: {fit['sigma'].shape}")

rstan-to-pystan

Safety Notice

Copy this and send it to your AI assistant to learn