Model Registry Maintainer
This skill provides guidance for maintaining MassGen's model registry across two key files:
-
massgen/backend/capabilities.py
-
Models, capabilities, release dates
-
massgen/token_manager/token_manager.py
-
Pricing, context windows
When to Use This Skill
-
New model released by a provider
-
Model pricing changes
-
Context window limits updated
-
Model capabilities changed
-
New provider/backend added
Two Files to Maintain
File 1: capabilities.py (Models & Features)
What it contains:
-
List of available models per provider
-
Model capabilities (web search, code execution, vision, etc.)
-
Release dates
-
Default models
Used by:
-
Config builder (--quickstart , --generate-config )
-
Documentation generation
-
Backend validation
Always update this file for new models.
File 2: token_manager.py (Pricing & Limits)
What it contains:
-
Hardcoded pricing/context windows for models NOT in LiteLLM database
-
On-demand loading from LiteLLM database (500+ models)
Used by:
-
Cost estimation
-
Token counting
-
Context management
Pricing resolution order:
-
LiteLLM database (fetched on-demand, cached 1 hour)
-
Hardcoded PROVIDER_PRICING (fallback only)
-
Pattern matching heuristics
Only update PROVIDER_PRICING if:
-
Model is NOT in LiteLLM database
-
LiteLLM pricing is incorrect/outdated
-
Model is custom/internal to your organization
Information to Gather for New Models
- Release Date
-
Format: "YYYY-MM"
-
Sources:
-
OpenAI: https://openai.com/index
-
Anthropic: https://www.anthropic.com/news
-
Google DeepMind: https://blog.google/technology/google-deepmind/
-
xAI: https://x.ai/news
- Context Window
-
Input context size (tokens)
-
Max output tokens
-
Look for: "context window", "max tokens", "input/output limits"
- Pricing
-
Input cost per 1K tokens (USD)
-
Output cost per 1K tokens (USD)
-
Cached input cost (if applicable)
-
Sources:
-
OpenAI: https://openai.com/api/pricing/
-
Anthropic: https://www.anthropic.com/pricing
-
Google: https://ai.google.dev/pricing
- Capabilities
-
Web search, code execution, vision, reasoning, etc.
-
Check official API documentation
- Model Name
-
Exact API identifier (case-sensitive)
-
Check provider's model documentation
Adding a New Model - Complete Workflow
Step 1: Add to capabilities.py
Add model to the models list and model_release_dates :
massgen/backend/capabilities.py
"openai": BackendCapabilities( # ... existing fields ... models=[ "new-model-name", # Add here (newest first) "gpt-5.1", # ... existing models ... ], model_release_dates={ "new-model-name": "2025-12", # Add here "gpt-5.1": "2025-11", # ... existing dates ... }, )
Step 2: Check if pricing is in LiteLLM (Usually Skip)
First, check if the model is already in LiteLLM database:
import requests
url = "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json" pricing_db = requests.get(url).json()
if "new-model-name" in pricing_db: print("✅ Model found in LiteLLM - no need to update token_manager.py") print(f"Pricing: ${pricing_db['new-model-name']['input_cost_per_token']*1000}/1K input") else: print("❌ Model NOT in LiteLLM - need to add to PROVIDER_PRICING")
Only if NOT in LiteLLM, add to PROVIDER_PRICING :
massgen/token_manager/token_manager.py
PROVIDER_PRICING: Dict[str, Dict[str, ModelPricing]] = { "OpenAI": { # Format: ModelPricing(input_per_1k, output_per_1k, context_window, max_output) "new-model-name": ModelPricing(0.00125, 0.01, 300000, 150000), # ... existing models ... }, }
Provider name mapping:
-
"OpenAI" (not "openai")
-
"Anthropic" (not "claude")
-
"Google" (not "gemini")
-
"xAI" (not "grok")
Step 3: Update Capabilities (if new features)
If the model introduces new capabilities:
supported_capabilities={ "web_search", "code_execution", "new_capability", # Add here }
Step 4: Update Default Model (if appropriate)
Only change if the new model should be the recommended default:
default_model="new-model-name"
Step 5: Validate and Test
Run capabilities tests
uv run pytest massgen/tests/test_backend_capabilities.py -v
Test config generation with new model
massgen --generate-config ./test.yaml --config-backend openai --config-model new-model-name
Verify the config was created successfully
cat ./test.yaml
Step 6: Regenerate Documentation
uv run python docs/scripts/generate_backend_tables.py cd docs && make html
Current Model Data
OpenAI Models (as of Nov 2025)
In capabilities.py:
models=[ "gpt-5.1", # 2025-11 "gpt-5-codex", # 2025-09 "gpt-5", # 2025-08 "gpt-5-mini", # 2025-08 "gpt-5-nano", # 2025-08 "gpt-4.1", # 2025-04 "gpt-4.1-mini", # 2025-04 "gpt-4.1-nano", # 2025-04 "gpt-4o", # 2024-05 "gpt-4o-mini", # 2024-07 "o4-mini", # 2025-04 ]
In token_manager.py (add missing models):
"OpenAI": { "gpt-5": ModelPricing(0.00125, 0.01, 400000, 128000), "gpt-5-mini": ModelPricing(0.00025, 0.002, 400000, 128000), "gpt-5-nano": ModelPricing(0.00005, 0.0004, 400000, 128000), "gpt-4o": ModelPricing(0.0025, 0.01, 128000, 16384), "gpt-4o-mini": ModelPricing(0.00015, 0.0006, 128000, 16384), # Missing: gpt-5.1, gpt-5-codex, gpt-4.1 family, o4-mini }
Claude Models (as of Nov 2025)
In capabilities.py:
models=[ "claude-haiku-4-5-20251001", # 2025-10 "claude-sonnet-4-5-20250929", # 2025-09 "claude-opus-4-1-20250805", # 2025-08 "claude-sonnet-4-20250514", # 2025-05 ]
In token_manager.py:
"Anthropic": { "claude-haiku-4-5": ModelPricing(0.001, 0.005, 200000, 65536), "claude-sonnet-4-5": ModelPricing(0.003, 0.015, 200000, 65536), "claude-opus-4.1": ModelPricing(0.015, 0.075, 200000, 32768), "claude-sonnet-4": ModelPricing(0.003, 0.015, 200000, 8192), }
Gemini Models (as of Nov 2025)
In capabilities.py:
models=[ "gemini-3-pro-preview", # 2025-11 "gemini-2.5-flash", # 2025-06 "gemini-2.5-pro", # 2025-06 ]
In token_manager.py (missing gemini-2.5 and gemini-3):
"Google": { "gemini-1.5-pro": ModelPricing(0.00125, 0.005, 2097152, 8192), "gemini-1.5-flash": ModelPricing(0.000075, 0.0003, 1048576, 8192), # Missing: gemini-2.5-pro, gemini-2.5-flash, gemini-3-pro-preview }
Grok Models (as of Nov 2025)
In capabilities.py:
models=[ "grok-4-1-fast-reasoning", # 2025-11 "grok-4-1-fast-non-reasoning", # 2025-11 "grok-code-fast-1", # 2025-08 "grok-4", # 2025-07 "grok-4-fast", # 2025-09 "grok-3", # 2025-02 "grok-3-mini", # 2025-05 ]
In token_manager.py (missing grok-3, grok-4 families):
"xAI": { "grok-2-latest": ModelPricing(0.005, 0.015, 131072, 131072), "grok-2": ModelPricing(0.005, 0.015, 131072, 131072), "grok-2-mini": ModelPricing(0.001, 0.003, 131072, 65536), # Missing: grok-3, grok-4, grok-4-1 families }
Model Name Matching
Important: The names in PROVIDER_PRICING use simplified patterns:
-
"gpt-5" matches gpt-5 , gpt-5-preview , gpt-5-*
-
"claude-sonnet-4-5" matches claude-sonnet-4-5-* (any date suffix)
-
"gemini-2.5-pro" is exact match
The token manager uses prefix matching for flexibility.
Common Tasks
Task: Add brand new GPT-5.2 model
-
Research: Release date, pricing, context window, capabilities
-
Add to capabilities.py models list and release_dates
-
Add to token_manager.py PROVIDER_PRICING["OpenAI"]
-
Run tests
-
Regenerate docs
Task: Update pricing for existing model
-
Verify new pricing from official source
-
Update only token_manager.py PROVIDER_PRICING
-
No need to touch capabilities.py
-
Document change in notes if significant
Task: Add new capability to model
-
Update supported_capabilities in capabilities.py
-
Add to notes explaining when/how capability works
-
Update backend implementation if needed
-
Run tests
Validation Commands
Test capabilities registry
uv run pytest massgen/tests/test_backend_capabilities.py -v
Test token manager
uv run pytest massgen/tests/test_token_manager.py -v
Generate config with new model
massgen --generate-config ./test.yaml --config-backend openai --config-model new-model
Build docs to verify tables
cd docs && make html
Programmatic Model Updates
LiteLLM Pricing Database (RECOMMENDED)
The easiest way to get comprehensive model pricing and context window data:
URL: https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
Coverage: 500+ models across 30+ providers including:
-
OpenAI, Anthropic, Google, xAI
-
Together AI, Groq, Cerebras, Fireworks
-
AWS Bedrock, Azure, Cohere, and more
Data Available:
{ "gpt-4o": { "input_cost_per_token": 0.0000025, "output_cost_per_token": 0.00001, "max_input_tokens": 128000, "max_output_tokens": 16384, "supports_vision": true, "supports_function_calling": true, "supports_prompt_caching": true } }
Usage:
import requests
Fetch latest pricing
url = "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json" pricing_db = requests.get(url).json()
Get info for a model
model_info = pricing_db.get("gpt-4o") input_per_1k = model_info["input_cost_per_token"] * 1000 output_per_1k = model_info["output_cost_per_token"] * 1000
Update token_manager.py from LiteLLM:
-
Convert per-token costs to per-1K costs
-
Extract context window and max output tokens
-
Keep models in reverse chronological order
OpenRouter API (Real-Time)
For the most up-to-date model list with live pricing:
Endpoint: https://openrouter.ai/api/v1/models
Data Available:
-
Real-time pricing (prompt, completion, reasoning, caching)
-
Context windows and max completion tokens
-
Model capabilities and modalities
-
200+ models from multiple providers
Usage:
import requests import os
headers = {"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}"} response = requests.get("https://openrouter.ai/api/v1/models", headers=headers) models = response.json()["data"]
for model in models: print(f"{model['id']}: ${model['pricing']['prompt']} input, ${model['pricing']['completion']} output")
Provider-Specific APIs
Provider Models API Pricing in API? Recommendation
OpenAI https://api.openai.com/v1/models
❌ No Use LiteLLM
Claude No public API ❌ No Use LiteLLM
Gemini https://generativelanguage.googleapis.com/v1beta/models
❌ No API + LiteLLM
Grok (xAI) https://api.x.ai/v1/models
❌ No Use LiteLLM
Together AI https://api.together.xyz/v1/models
✅ Yes API directly
Groq https://api.groq.com/openai/v1/models
❌ No Use LiteLLM
Cerebras https://api.cerebras.ai/v1/models
❌ No Use LiteLLM
Fireworks https://api.fireworks.ai/v1/accounts/{id}/models
❌ No Use LiteLLM
Azure OpenAI Azure Management API ❌ Complex Manual
Claude Code No API ❌ No Manual
Automation Script
Create scripts/update_model_pricing.py to automate updates:
#!/usr/bin/env python3 """Update token_manager.py pricing from LiteLLM database."""
import requests
Fetch LiteLLM database
url = "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json" pricing_db = requests.get(url).json()
Filter by provider
openai_models = {k: v for k, v in pricing_db.items() if v.get("litellm_provider") == "openai"} anthropic_models = {k: v for k, v in pricing_db.items() if v.get("litellm_provider") == "anthropic"}
Generate ModelPricing entries
for model_name, info in openai_models.items(): input_per_1k = info["input_cost_per_token"] * 1000 output_per_1k = info["output_cost_per_token"] * 1000 context = info.get("max_input_tokens", 0) max_output = info.get("max_output_tokens", 0)
print(f' "{model_name}": ModelPricing({input_per_1k}, {output_per_1k}, {context}, {max_output}),')
Run weekly to keep pricing current:
uv run python scripts/update_model_pricing.py
Reference Files
-
Capabilities registry: massgen/backend/capabilities.py
-
Token/pricing manager: massgen/token_manager/token_manager.py
-
Capabilities tests: massgen/tests/test_backend_capabilities.py
-
Config builder: massgen/config_builder.py
-
Doc generator: docs/scripts/generate_backend_tables.py
-
LiteLLM database: https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
-
OpenRouter API: https://openrouter.ai/docs/overview/models
Important Maintenance Notes
-
Keep models in reverse chronological order - Newest first
-
Use exact API names - Match provider documentation exactly
-
Verify pricing units - Always per 1K tokens in token_manager.py
-
Document uncertainties - If info is estimated/unofficial, note it
-
Update both files - Don't forget token_manager.py when adding models
-
Use LiteLLM for pricing - Comprehensive and frequently updated
-
Test after updates - Run pytest to verify no breaking changes