Spice Workers
Workers coordinate model interactions, enabling load balancing and fallback strategies across multiple models.
Basic Configuration
workers:
- name: <worker_name> type: load_balance description: | Worker description load_balance: routing: - from: <model_name>
Load Balancing Strategies
Round Robin
Distribute requests evenly across models:
workers:
- name: round_robin type: load_balance load_balance: routing: - from: model_a - from: model_b - from: model_c
Fallback (Priority Order)
Try models in order, falling back on failure:
workers:
- name: fallback type: load_balance load_balance: routing: - from: primary_model order: 1 - from: backup_model order: 2 - from: emergency_model order: 3
Weighted Distribution
Route by percentage weight:
workers:
- name: weighted type: load_balance load_balance: routing: - from: fast_model weight: 8 # 80% of traffic - from: slow_model weight: 2 # 20% of traffic
Using Workers
Workers are invoked using the same API as models:
curl http://localhost:8090/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "fallback",
"messages": [{"role": "user", "content": "Hello"}]
}'
Full Example
models:
- from: openai:gpt-4o name: gpt4 params: openai_api_key: ${ secrets:OPENAI_API_KEY }
- from: anthropic:claude-sonnet-4-5 name: claude params: anthropic_api_key: ${ secrets:ANTHROPIC_API_KEY }
workers:
- name: smart_router type: load_balance description: Try GPT-4 first, fall back to Claude load_balance: routing: - from: gpt4 order: 1 - from: claude order: 2
Documentation
-
Workers Overview
-
Workers Reference