PufferLib - High-Performance Reinforcement Learning
Overview
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
When to Use This Skill
Use this skill when:
-
Training RL agents with PPO on any environment (single or multi-agent)
-
Creating custom environments using the PufferEnv API
-
Optimizing performance for parallel environment simulation (vectorization)
-
Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
-
Developing policies with CNN, LSTM, or custom architectures
-
Scaling RL to millions of steps per second for faster experimentation
-
Multi-agent RL with native multi-agent environment support
Core Capabilities
- High-Performance Training (PuffeRL)
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
Quick start training:
CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
Distributed training
torchrun --nproc_per_node=4 train.py
Python training loop:
import pufferlib from pufferlib import PuffeRL
Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)
Create trainer
trainer = PuffeRL( env=env, policy=my_policy, device='cuda', learning_rate=3e-4, batch_size=32768 )
Training loop
for iteration in range(num_iterations): trainer.evaluate() # Collect rollouts trainer.train() # Train on batch trainer.mean_and_log() # Log results
For comprehensive training guidance, read references/training.md for:
-
Complete training workflow and CLI options
-
Hyperparameter tuning with Protein
-
Distributed multi-GPU/multi-node training
-
Logger integration (Weights & Biases, Neptune)
-
Checkpointing and resume training
-
Performance optimization tips
-
Curriculum learning patterns
- Environment Development (PufferEnv)
Create custom high-performance environments with the PufferEnv API.
Basic environment structure:
import numpy as np from pufferlib import PufferEnv
class MyEnvironment(PufferEnv): def init(self, buf=None): super().init(buf)
# Define spaces
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# Reset state and return initial observation
return np.zeros(4, dtype=np.float32)
def step(self, action):
# Execute action, compute reward, check done
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, info
Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:
-
Different observation space types (vector, image, dict)
-
Action space variations (discrete, continuous, multi-discrete)
-
Multi-agent environment structure
-
Testing utilities
For complete environment development, read references/environments.md for:
-
PufferEnv API details and in-place operation patterns
-
Observation and action space definitions
-
Multi-agent environment creation
-
Ocean suite (20+ pre-built environments)
-
Performance optimization (Python to C workflow)
-
Environment wrappers and best practices
-
Debugging and validation techniques
- Vectorization and Performance
Achieve maximum throughput with optimized parallel simulation.
Vectorization setup:
import pufferlib
Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
Performance benchmarks:
- Pure Python envs: 100k-500k SPS
- C-based envs: 100M+ SPS
- With training: 400k-4M total SPS
Key optimizations:
-
Shared memory buffers for zero-copy observation passing
-
Busy-wait flags instead of pipes/queues
-
Surplus environments for async returns
-
Multiple environments per worker
For vectorization optimization, read references/vectorization.md for:
-
Architecture and performance characteristics
-
Worker and batch size configuration
-
Serial vs multiprocessing vs async modes
-
Shared memory and zero-copy patterns
-
Hierarchical vectorization for large scale
-
Multi-agent vectorization strategies
-
Performance profiling and troubleshooting
- Policy Development
Build policies as standard PyTorch modules with optional utilities.
Basic policy structure:
import torch.nn as nn from pufferlib.pytorch import layer_init
class Policy(nn.Module): def init(self, observation_space, action_space): super().init()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)
For complete policy development, read references/policies.md for:
-
CNN policies for image observations
-
Recurrent policies with optimized LSTM (3x faster inference)
-
Multi-input policies for complex observations
-
Continuous action policies
-
Multi-agent policies (shared vs independent parameters)
-
Advanced architectures (attention, residual)
-
Observation normalization and gradient clipping
-
Policy debugging and testing
- Environment Integration
Seamlessly integrate environments from popular RL frameworks.
Gymnasium integration:
import gymnasium as gym import pufferlib
Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1') env = pufferlib.emulate(gym_env, num_envs=256)
Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
PettingZoo multi-agent:
Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
Supported frameworks:
-
Gymnasium / OpenAI Gym
-
PettingZoo (parallel and AEC)
-
Atari (ALE)
-
Procgen
-
NetHack / MiniHack
-
Minigrid
-
Neural MMO
-
Crafter
-
GPUDrive
-
MicroRTS
-
Griddly
-
And more...
For integration details, read references/integration.md for:
-
Complete integration examples for each framework
-
Custom wrappers (observation, reward, frame stacking, action repeat)
-
Space flattening and unflattening
-
Environment registration
-
Compatibility patterns
-
Performance considerations
-
Integration debugging
Quick Start Workflow
For Training Existing Environments
-
Choose environment from Ocean suite or compatible framework
-
Use scripts/train_template.py as starting point
-
Configure hyperparameters for your task
-
Run training with CLI or Python script
-
Monitor with Weights & Biases or Neptune
-
Refer to references/training.md for optimization
For Creating Custom Environments
-
Start with scripts/env_template.py
-
Define observation and action spaces
-
Implement reset() and step() methods
-
Test environment locally
-
Vectorize with pufferlib.emulate() or make()
-
Refer to references/environments.md for advanced patterns
-
Optimize with references/vectorization.md if needed
For Policy Development
-
Choose architecture based on observations:
-
Vector observations → MLP policy
-
Image observations → CNN policy
-
Sequential tasks → LSTM policy
-
Complex observations → Multi-input policy
-
Use layer_init for proper weight initialization
-
Follow patterns in references/policies.md
-
Test with environment before full training
For Performance Optimization
-
Profile current throughput (steps per second)
-
Check vectorization configuration (num_envs, num_workers)
-
Optimize environment code (in-place ops, numpy vectorization)
-
Consider C implementation for critical paths
-
Use references/vectorization.md for systematic optimization
Resources
scripts/
train_template.py - Complete training script template with:
-
Environment creation and configuration
-
Policy initialization
-
Logger integration (WandB, Neptune)
-
Training loop with checkpointing
-
Command-line argument parsing
-
Multi-GPU distributed training setup
env_template.py - Environment implementation templates:
-
Single-agent PufferEnv example (grid world)
-
Multi-agent PufferEnv example (cooperative navigation)
-
Multiple observation/action space patterns
-
Testing utilities
references/
training.md - Comprehensive training guide:
-
Training workflow and CLI options
-
Hyperparameter configuration
-
Distributed training (multi-GPU, multi-node)
-
Monitoring and logging
-
Checkpointing
-
Protein hyperparameter tuning
-
Performance optimization
-
Common training patterns
-
Troubleshooting
environments.md - Environment development guide:
-
PufferEnv API and characteristics
-
Observation and action spaces
-
Multi-agent environments
-
Ocean suite environments
-
Custom environment development workflow
-
Python to C optimization path
-
Third-party environment integration
-
Wrappers and best practices
-
Debugging
vectorization.md - Vectorization optimization:
-
Architecture and key optimizations
-
Vectorization modes (serial, multiprocessing, async)
-
Worker and batch configuration
-
Shared memory and zero-copy patterns
-
Advanced vectorization (hierarchical, custom)
-
Multi-agent vectorization
-
Performance monitoring and profiling
-
Troubleshooting and best practices
policies.md - Policy architecture guide:
-
Basic policy structure
-
CNN policies for images
-
LSTM policies with optimization
-
Multi-input policies
-
Continuous action policies
-
Multi-agent policies
-
Advanced architectures (attention, residual)
-
Observation processing and unflattening
-
Initialization and normalization
-
Debugging and testing
integration.md - Framework integration guide:
-
Gymnasium integration
-
PettingZoo integration (parallel and AEC)
-
Third-party environments (Procgen, NetHack, Minigrid, etc.)
-
Custom wrappers (observation, reward, frame stacking, etc.)
-
Space conversion and unflattening
-
Environment registration
-
Compatibility patterns
-
Performance considerations
-
Debugging integration
Tips for Success
Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments
Profile early: Measure steps per second from the start to identify bottlenecks
Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points
Read references as needed: Each reference file is self-contained and focused on a specific capability
Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed
Leverage vectorization: PufferLib's vectorization is key to achieving high throughput
Monitor training: Use WandB or Neptune to track experiments and identify issues early
Test environments: Validate environment logic before scaling up training
Check existing environments: Ocean suite provides 20+ pre-built environments
Use proper initialization: Always use layer_init from pufferlib.pytorch for policies
Common Use Cases
Training on Standard Benchmarks
Atari
env = pufferlib.make('atari-pong', num_envs=256)
Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)
Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
Multi-Agent Learning
PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space) trainer = PuffeRL(env=env, policy=policy)
Custom Task Development
Create custom environment
class MyTask(PufferEnv): # ... implement environment ...
Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256) trainer = PuffeRL(env=env, policy=my_policy)
High-Performance Optimization
Maximize throughput
env = pufferlib.make( 'my-env', num_envs=1024, # Large batch num_workers=16, # Many workers envs_per_worker=64 # Optimize per worker )
Installation
pip install pufferlib
Documentation
-
Official docs: https://puffer.ai/docs.html
-
Discord: Community support available