ML Deployment Helper

Overview

Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.

Deployment Checklist

Before deploying any model, this skill ensures:

✅ Model versioned and tracked
✅ Dependencies documented (requirements.txt/Dockerfile)
✅ API endpoint created
✅ Input validation implemented
✅ Monitoring configured
✅ A/B testing ready
✅ Rollback plan documented
✅ Performance benchmarked

Deployment Patterns

Pattern 1: REST API (FastAPI)

from specweave import create_model_api

Generates production-ready API

api = create_model_api( model_path="models/model-v3.pkl", increment="0042", framework="fastapi" )

Creates:

- api/

├── main.py (FastAPI app)

├── models.py (Pydantic schemas)

├── predict.py (Prediction logic)

├── Dockerfile

├── requirements.txt

└── tests/

Generated main.py :

from fastapi import FastAPI, HTTPException from pydantic import BaseModel import joblib

app = FastAPI(title="Recommendation Model API", version="0042-v3")

model = joblib.load("model-v3.pkl")

class PredictionRequest(BaseModel): user_id: int context: dict

@app.post("/predict") async def predict(request: PredictionRequest): try: prediction = model.predict([request.dict()]) return { "recommendations": prediction.tolist(), "model_version": "0042-v3", "timestamp": datetime.now() } except Exception as e: raise HTTPException(status_code=500, detail=str(e))

@app.get("/health") async def health(): return {"status": "healthy", "model_loaded": model is not None}

Pattern 2: Batch Prediction

from specweave import create_batch_predictor

For offline scoring

batch_predictor = create_batch_predictor( model_path="models/model-v3.pkl", increment="0042", input_path="s3://bucket/data/", output_path="s3://bucket/predictions/" )

Creates:

- batch/

├── predictor.py

├── scheduler.yaml (Airflow/Kubernetes CronJob)

└── monitoring.py

Pattern 3: Real-Time Streaming

from specweave import create_streaming_predictor

For Kafka/Kinesis streams

streaming = create_streaming_predictor( model_path="models/model-v3.pkl", increment="0042", input_topic="user-events", output_topic="predictions" )

Creates:

- streaming/

├── consumer.py

├── predictor.py

├── producer.py

└── docker-compose.yaml

Containerization

from specweave import containerize_model

Generates optimized Dockerfile

dockerfile = containerize_model( model_path="models/model-v3.pkl", framework="sklearn", python_version="3.10", increment="0042" )

Generated Dockerfile :

FROM python:3.10-slim

WORKDIR /app

Copy model and dependencies

COPY models/model-v3.pkl /app/model.pkl COPY requirements.txt /app/

Install dependencies

RUN pip install --no-cache-dir -r requirements.txt

Copy application

COPY api/ /app/api/

Health check

HEALTHCHECK --interval=30s --timeout=3s
CMD curl -f http://localhost:8000/health || exit 1

Run API

CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]

Monitoring Setup

from specweave import setup_model_monitoring

Configures monitoring for production

monitoring = setup_model_monitoring( model_name="recommendation-model", increment="0042", metrics=[ "prediction_latency", "throughput", "error_rate", "prediction_distribution", "feature_drift" ] )

Creates:

- monitoring/

├── prometheus.yaml

├── grafana-dashboard.json

├── alerts.yaml

└── drift-detector.py

A/B Testing Infrastructure

from specweave import create_ab_test

Sets up A/B test framework

ab_test = create_ab_test( control_model="model-v2.pkl", treatment_model="model-v3.pkl", traffic_split=0.1, # 10% to new model success_metric="click_through_rate", increment="0042" )

Creates:

- ab-test/

├── router.py (traffic splitting)

├── metrics.py (success tracking)

├── statistical-tests.py (significance testing)

└── dashboard.py (real-time monitoring)

A/B Test Router:

import random

def route_prediction(user_id, control_model, treatment_model): """Route to control or treatment based on user_id hash"""

# Consistent hashing (same user always gets same model)
user_bucket = hash(user_id) % 100

if user_bucket &#x3C; 10:  # 10% to treatment
    return treatment_model.predict(features), "treatment"
else:
    return control_model.predict(features), "control"

Model Versioning

from specweave import ModelVersion

Register model version

version = ModelVersion.register( model_path="models/model-v3.pkl", increment="0042", metadata={ "accuracy": 0.87, "training_date": "2024-01-15", "data_version": "v2024-01", "framework": "xgboost==1.7.0" } )

Easy rollback

if production_metrics["error_rate"] > threshold: ModelVersion.rollback(to_version="0042-v2")

Load Testing

from specweave import load_test_model

Benchmark model performance

results = load_test_model( api_url="http://localhost:8000/predict", requests_per_second=[10, 50, 100, 500, 1000], duration_seconds=60, increment="0042" )

Output:

Load Test Results:

RPS	Latency P50	Latency P95	Latency P99	Error Rate
10	35ms	45ms	50ms	0.00%
50	38ms	52ms	65ms	0.00%
100	45ms	70ms	95ms	0.02%
500	120ms	250ms	400ms	1.20%
1000	350ms	800ms	1200ms	8.50%

Recommendation: Deploy with max 100 RPS per instance Target: <100ms P95 latency (achieved at 100 RPS)

Deployment Commands

Generate deployment artifacts

/ml:deploy-prepare 0042

Create API

/ml:create-api --increment 0042 --framework fastapi

Setup monitoring

/ml:setup-monitoring 0042

Create A/B test

/ml:create-ab-test --control v2 --treatment v3 --split 0.1

Load test

/ml:load-test 0042 --rps 100 --duration 60s

Deploy to production

/ml:deploy 0042 --environment production

Deployment Increment

The skill creates a deployment increment:

.specweave/increments/0043-deploy-recommendation-model/ ├── spec.md (deployment requirements) ├── plan.md (deployment strategy) ├── tasks.md │ ├── [ ] Containerize model │ ├── [ ] Create API │ ├── [ ] Setup monitoring │ ├── [ ] Configure A/B test │ ├── [ ] Load test │ ├── [ ] Deploy to staging │ ├── [ ] Validate staging │ └── [ ] Deploy to production ├── api/ (FastAPI app) ├── monitoring/ (Grafana dashboards) ├── ab-test/ (A/B testing logic) └── load-tests/ (Performance benchmarks)

Best Practices

Always load test before production
Start with 1-5% traffic in A/B test
Monitor model drift in production
Version everything (model, data, code)
Document rollback plan before deploying
Set up alerts for anomalies
Gradual rollout (canary deployment)

Integration with SpecWeave

After training model (increment 0042)

/sw:inc "0043-deploy-recommendation-model"

Generates deployment increment with all artifacts

/sw:do

Deploy to production when ready

/ml:deploy 0043 --environment production

Model deployment is not the end—it's the beginning of the MLOps lifecycle.

ml-deployment-helper

Safety Notice

Copy this and send it to your AI assistant to learn

Generates production-ready API

Creates:

- api/

├── main.py (FastAPI app)

├── models.py (Pydantic schemas)

├── predict.py (Prediction logic)

├── Dockerfile

├── requirements.txt

└── tests/

For offline scoring

Creates:

- batch/

├── predictor.py

├── scheduler.yaml (Airflow/Kubernetes CronJob)

└── monitoring.py

For Kafka/Kinesis streams

Creates:

- streaming/

├── consumer.py

├── predictor.py

├── producer.py

└── docker-compose.yaml

Generates optimized Dockerfile

Copy model and dependencies

Install dependencies

Copy application

Health check

Run API

Configures monitoring for production

Creates:

- monitoring/

├── prometheus.yaml

├── grafana-dashboard.json

├── alerts.yaml

└── drift-detector.py

Sets up A/B test framework

Creates:

- ab-test/

├── router.py (traffic splitting)

├── metrics.py (success tracking)

├── statistical-tests.py (significance testing)

└── dashboard.py (real-time monitoring)

Register model version

Easy rollback

Benchmark model performance

Load Test Results:

Generate deployment artifacts

Create API

Setup monitoring

Create A/B test

Load test

Deploy to production

After training model (increment 0042)

Generates deployment increment with all artifacts

Deploy to production when ready

Source Transparency

Related Skills

expo-workflow

n8n-kafka-workflows

gitops-workflow

billing-automation